Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR

  • Saeeda Naz Department of Information Technology, Hazara University, Mansehra, KPK, and Gover nment Post-Graduae Girls College No. 1, Higher Education Department, Abbottabad, KPK
  • Arif Iqbal Umar Department of Information Technology, Hazara University, Mansehra, KPK
  • Muhammad Imran Razzak King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia

Abstract

Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using GCT (Ghost Character Theory) concept. Arabic and its sibling script languages share the similar character dataset i.e. the character set are difference in diacritic and writing styles like Naskh or Nasta’liq. Based on the proposed method, the lexicon for Arabic and Arabic script based languages can be minimized approximately up to 20 times. The proposed multilingual Arabic script OCR approach have been evaluated for online Arabic and its derivative language like Urdu using BPNN. The result showed that proposed method helps to not only the reduction of lexicon but also helps to develop the Multilanguage character recognition system for Arabic Script.

Published
Jul 1, 2016
How to Cite
NAZ, Saeeda; UMAR, Arif Iqbal; RAZZAK, Muhammad Imran. Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR. Mehran University Research Journal of Engineering and Technology, [S.l.], v. 35, n. 2, p. 209-216, july 2016. ISSN 2413-7219. Available at: <https://publications.muet.edu.pk/index.php/muetrj/article/view/542>. Date accessed: 20 apr. 2024. doi: http://dx.doi.org/10.22581/muet1982.1602.06.
Section
Articles
This is an open Access Article published by Mehran University of Engineering and Technolgy, Jamshoro under CCBY 4.0 International License