Review of Advances in Digital Recognition of Indian Language Manuscripts

Bhavesh Kataria,Harikrishna B. Jethva
DOI: https://doi.org/10.32628/ijsrset1841215
2018-01-02
Abstract:Digital content creation and document management in Indian languages are in progressing stage. OCR has become an administrative requirement for effective governance and daily activities. Scripts including those from medieval to contemporary time are of literary and political importance. The present research initiatives highlights the importance and needs of efforts in recognition of printed and handwritten documents written in languages of Indian origin. This paper is aims at reviewing the state of various scripts in use including those from medieval to present era and explores the prospective of digital recognition of handwritten and printed texts and thereby pointing towards futuristic trends in developing restoration software for Indic scripts. While OCRs for Indic scripts like Devanagari has attained good results and still improving the accuracy levels, many medieval and ancient scripts have very little attempts. Challenge is due to the number of languages and their diverse scripts. The scarcity of digitized linguistic resources makes the task a tougher one. The paper also highlights on the characteristics and challenges of recognition of scripts of Indic origin. Largely the digital recognition is limited to simple numerals and isolated characters. The paper enumerates the highest known performance of OCR attempts for important Indic scripts and suggests possibilities of using various approaches including statistical and soft computing for recognizing scripts of medieval times in particular.
What problem does this paper attempt to address?