Multi-Digit Handwritten Sindhi Numerals Recognition using SOM Neural Network

In this research paper a multi-digit Sindhi handwritten numerals recognition system using SOM Neural Network is presented. Handwritten digits recognition is one of the challenging tasks and a lot of research is being carried out since many years. A remarkable work has been done for recognition of isolated handwritten characters as well as digits in many languages like English, Arabic, Devanagari, Chinese, Urdu and Pashto. However, the literature reviewed does not show any remarkable work done for Sindhi numerals recognition. The recognition of Sindhi digits is a difficult task due to the various writing styles and different font sizes. Therefore, SOM (Self-Organizing Map), a NN (Neural Network) method is used which can recognize digits with various writing styles and different font sizes. Only one sample is required to train the network for each pair of multi-digit numerals. A database consisting of 4000 samples of multi-digits consisting only two digits from 10-50 and other matching numerals have been collected by 50 users and the experimental results of proposed method show that an accuracy of 86.89% is achieved.


INTRODUCTION
H andwritten digits recognition is a technique to segment, classify, detect and recognize a digit from the image. It is called a subfield of pattern recognition and AI (Artificial Intelligence). The process for recognition of digits involves the phenomenon of detection of digits from an image and then converts them into machine readable format such as ASCII (American Standard Code for Information Interchange) for recognition purpose [1].
The digits recognition techniques can be classified as either offline or online. In offline technique, the document with handwritten or typewritten digits is generated first, converted into digital form, stored in the disk and then processed.
However, in online recognition technique the digit is processed for recognition during its creation [2]. A lot of research is being carried out for digits recognition systems since last few decades due to their use in various common applications such as cheque numbers in bank, vehicle number plates, barcode numbers, postal codes and others.
Handwritten digits recognition systems can also be important in order to make historical books and documents in machine readable and editable format so that they may easily be accessed. So these systems can be helpful for the automation processes and they can be used to enhance the interface between human beings and machine in various applications.
Handwritten numerals recognition is one of the challenging task which is under research since many years. The numerals written by hand are not uniform, they may be written in many different writing styles by different writers and even the same writer may write in different ways at different times [3].
Sindhi Language is one the ancient Indus valley language having hundreds of years history and is widely spoken by approximately 40 million people in Sindh province of Pakistan, many states of India as well as various other areas of the world [4,5]. It is taught as a basic language in almost all the primary schools of Sindh province and is second largest spoken language in Pakistan.
It is difficult to extract accurate features for online Sindhi handwritten numeral recognition because Sindhi numerals are written like Farsi and Arabic numerals. However, using digital pen or stylus it is easy to write digits on the surface of the touch screen device. Sindhi numerals are similar to Arabic and Farsi scripts, but most of the digits are written like the old Arabic digits style. Some of Sindhi digits such as " ", " " and " " are different in style as compared to Arabic digits. These digits are also similar with Urdu and Hindi digits but have minor differences [6]. The writing style and direction of Sindhi digits is also different from Arabic digits as well as Urdu digits. Fig. 1 shows the list of isolated numerals used for Sindhi scripts and Fig. 2 shows few of the multidigit numerals.
In isolated digits each numeral can be represented among one of the ten classes from ' to . For multi-digit recognition, a string of digits is separated into sub images, each consisting of a single digit. This process is done by the recognition component called segmenter. Each separate digit is then recognized by the recognizer. Fig. 3 shows the multi-digit recognition technique.
The organization of the paper is shown as: Section 2 outlines the existing work done for numerals recognition for different languages. Section 3 shows the block diagram used of multidigit handwritten Sindhi numeral recognition. Section 4 shows the experimental results. Section 5 outlines the discussions and section 6 outlines the conclusions and future work.

LITERATURE REVIEW
Sindhi language has not received proper attention of the researchers even this language has millions of speakers and writers. However, a lot of research work has been done for many other languages of the world. Singh, et al. [3] presented a technique called fusion of global and local features for handwritten Devanagari digits recognition and achieved a result of 95% or better and they also reported that by combining global and local features the confusion value has been decreased [3]. A remarkable work has been done for isolated digits recognition but not a more work is done for multi-font numerals recognition in near past [7]. The techniques which are so far used for multi-font numerals recognition suffer from many problems such as increased computation time and a huge collection of training set for each sample per font.
A huge collection of training set provides the facility to recognize multi-font digits but the accuracy of the system  al. [14] have The available literature also shows a remarkable work done for handwritten Arabic digits/numerals recognition [15][16][17][18] and not much work is done for Farsi and Urdu handwritten numerals recognition [19][20][21][22]. This research work presents the handwritten Sindhi language multi-digit recognition using neural network.

EXPERIMENTAL RESULTS
The proposed system has been applied multi-digit database which was developed by collecting samples from 50 different writers. Developed system can further be amended to recognize multi-digits consisting of multiple pairs. The accuracy of the system varies from 60-100% due to variations in writing styles as well as the size of input multidigit pair. Individual multi-digit recognition accuracy is given in Table  1. It is also shown that few multi-digits like " " and " " have 100% accuracy rate. The accuracy rate of matching multi-digits like " " and " " or others of same shape but different order of digits was up to 60%. The results in Table  1 also shows that from the 4000 samples of training available in the dataset, the proposed system has weekly recognized the multi-digits " " and " ".

DISCUSSIONS
In order to measure the accuracy of the proposed system, the performance of the system was tested on 4000 samples of 29 different multi-digit Sindhi language numerals. Initially the system was trained and tested on two pair multi-digits, however more than two digit pairs can also be trained and tested using the proposed system.
The samples of digits were collected using touch screen smart mobile device. The users were asked to write the multi-digit pair using their fingers.

CONCLUSIONS
In this research study a system was developed for recognition of handwritten multi-digit Sindhi numerals