Uyghur, Chinese and English Multilingual Document Recognition

Jin Jian-ming,Wang Hua,Ding Xiao-qing
2006-01-01
Abstract:The characteristics of Uyghur, Chinese and English scripts are totally different. A Uyghur, Chinese and English multilingual document recognition system is implemented the first time based on the multilingual OCR system design principle, which includes “multi-layer character language estimation” and “suitable adjustment”. At first, the language property of each text block is estimated according to the characteristics of Uyghur, Chinese and English scripts. After that, language-oriented character segmentation algorithms are performed on text blocks, and the character recognition confidence is used to judge whether the results of character segmentation and language property estimation of a text block are right. Experimental results show the recognition accuracy of Uyghur, Chinese and English multilingual documents achieves 96.4% and above.
What problem does this paper attempt to address?