Zernike Moments Based Handwritten Pashto Character Recognition Using Linear Discriminant Analysis
This paper presents an efficient Optical Character Recognition (OCR) system for offline isolated Pashto characters recognition. Developing an OCR system for handwritten character recognition is a challenging task because of the handwritten characters vary both in shape and in style and most of the time the handwritten characters also vary among the individuals. The identification of the inscribed Pashto letters becomes even palling due to the unavailability of a standard handwritten Pashto characters database. For experimental and simulation purposes a handwritten Pashto characters database is developed by collecting handwritten samples from the students of the university on A4 sized page. These collected samples are then scanned, stemmed and preprocessed to form a medium sized database that encompasses 14784 handwritten Pashto character images (336 distinguishing handwritten samples for each 44 characters in Pashto script). Furthermore, the Zernike moments are considered as a feature extractor tool for the proposed OCR system to extract features of each individual character. Linear Discriminant Analysis (LDA) is followed as a recognition tool for the proposed recognition system based on the calculated features map using Zernike moments. Applicability of the proposed system is tested by validating it with 10-fold cross-validation method and an overall accuracy of 63.71% is obtained for the handwritten Pashto isolated characters using the proposed OCR system.