A machine learning approach for Urdu text sentiment analysis

Muhammad Akhtar; Rana Saud Shoukat; Saif Ur Rehman

doi:10.22581/muet1982.2302.09

Muhammad Akhtar University Institute of Information Technology, PMAS Arid Agriculture University, Rawalpindi Pakistan
Rana Saud Shoukat University Institute of Information Technology, PMAS Arid Agriculture University, Rawalpindi Pakistan
Saif Ur Rehman University Institute of Information Technology, PMAS Arid Agriculture University, Rawalpindi Pakistan

DOI: https://doi.org/10.22581/muet1982.2302.09

Keywords: Machine Learning, Sentiment Analysis, Urdu Text, Roman Urdu Language, LSTM, Lexicon, Neural Networks, Feature Extractor, Sentiment Extraction, Evaluation Measures

Abstract

Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work.