Artificial Urdu Text Detection and Localization from Individual Video Frames

Salahuddin Unar; Akhtar Hussain Jalbani; Muhammad Moazzam Jawaid; Mohsin Shaikh; Asghar Ali Chandio

doi:10.22581/muet1982.1802.18

Salahuddin Unar School of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China.
Akhtar Hussain Jalbani Department of Information Technology, Quaid-e-Awam University of Engineering, Science and Technology, Nawabshah
Muhammad Moazzam Jawaid Department of Computer System Engineering, Mehran University of Engineering and Technology, Jamshoro
Mohsin Shaikh Quaid-e-Awam University College of Engineering, Science and Technology, Larkano.
Asghar Ali Chandio School of Engineering and Information Technology, University of New South Wales, Canberra, Australia

DOI: https://doi.org/10.22581/muet1982.1802.18

Keywords: Text Detection, Artificial Urdu Text, Video Images, Maximally Stable Extremal Region

Abstract

In current era of technology, information acquisition from images and videos become most important task due to the rapid development of data mining and machine learning.The information can be either textual, visual, or combination of these. Text appearing in images or videos is a significant source of information and plays a vital role to perceive it. Developing a unified method to detect the text is hard, as textual properties (i.e. font, size, color, illumination, orientation, etc.) may vary with the complex background. So far, multimedia and computer vision community unable yet to standardize any ideal approach to extract the text smoothly. In this paper, a novel method is proposed to detect and localize artificial Urdu text in individual video frames. Firstly, Sobel and Canny edge detection operators are applied to input frame and are merged with MSER (Maximally Stable Extremal Region) detected regions. Next, geometric constraints are applied to eliminate obvious non-text regions with large and small variations. Further refining of non-text regions is achieved by stroke width transform. SVM (Support Vector Machine) classifier is trained to classify text and non-text objects. Finally, bounding boxes are used to localize the text.Experimental results show that the proposed method is robust and efficient than state-of-the-art methods.