Parts-of-speech tagger for Sindhi language using deep neural network architecture

  • Adnan Ali Memon Department of Software Engineering, BUITEMS University, Quetta, Pakistan
  • Saman Hina Department of Computer Science and Information Technology, NED University of Engineering and Technology, Karachi, Pakistan
  • Abdul Karim Kazi Department of Computer Science and Information Technology, NED University of Engineering and Technology, Karachi, Pakistan
  • Saad Ahmed Department of Computer Science, Iqra University, Karachi, Pakistan

Abstract

Language is a fundamental medium for human communication, encompassing spoken and written forms, each governed by grammatical rules. Sindhi, one of the oldest languages, is characterized by its rich morphology and grammatical structure. Part-of-speech (POS) tagging, a crucial process in natural language processing, involves assigning grammatical tags to words. This research presents a novel approach to POS tagging for Sindhi text using deep learning techniques. We developed a POS tagger employing Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models, with LSTM demonstrating superior effectiveness. This study represents the first application of these deep learning methods for POS tagging in Sindhi. Utilizing fastText, we trained 79,959 Sindhi word vectors, derived from a corpus compiled from diverse sources including Sindhi books, stories, and poetry. The corpus comprises 1,459 sentences and 10,584 unique words, split into 80% for training and 20% for validation. Our results indicate that the LSTM model achieved an accuracy of 85.80%, outperforming the GRU model, which achieved 80.77%, by a margin of 5%. This work's novelty lies in the application of deep learning techniques to enhance POS tagging accuracy in the Sindhi language corpus.

Published
Jul 1, 2024
How to Cite
MEMON, Adnan Ali et al. Parts-of-speech tagger for Sindhi language using deep neural network architecture. Mehran University Research Journal of Engineering and Technology, [S.l.], v. 43, n. 3, p. 47-55, july 2024. ISSN 2413-7219. Available at: <https://publications.muet.edu.pk/index.php/muetrj/article/view/2768>. Date accessed: 24 nov. 2024. doi: http://dx.doi.org/10.22581/muet1982.2768.
Section
Articles
This is an open Access Article published by Mehran University of Engineering and Technolgy, Jamshoro under CCBY 4.0 International License