Handwritten Sindhi Character Recognition Using Neural Networks
Abstract
OCR (OpticalCharacter Recognition) is a technology in which text image is used to understand and write text by machines. The work on languages containing isolated characters such as German, English, French and others is at its peak. The OCR and ICR (Intelligent Character Recognition) research in Sindhi script is currently at in starting stages and not sufficient work have been cited in this area even though Sindhi language is rich in culture and history. This paper presents one of the initial steps in recognizing Sindhi handwritten characters. The isolated characters of Sindhi script written by thesubjects have been recognized. The various subjects were asked to write Sindhi characters in unconstrained form and then the written samples were collected and scanned through a flatbed scanner. The scanned documents were preprocessedwith the help of binary conversion, removing noise by pepper noise and the lines were segmented with the help of horizontal profile technique. The segmented lines were used to extract characters from scanned pages.This character segmentation was done by vertical projection. The extracted characters have been used to extract features so that the characters can be classified easily. Zoning was used for the feature extraction technique. For the classification, neural network has been used. The recognized characters converted into editable text with an average accuracy of 85%.