Multilingual document recognition research and its application in China

Liangrui Peng,Changsong Liu,Xiaoqing Ding,Hua Wang
DOI: https://doi.org/10.1109/DIAL.2006.27
2006-01-01
Abstract:This paper demonstrates the research work on multilingual document recognition technology and its application in China, which is useful for building multilingual digital library. The multilingual OCR (optical character recognition) key technologies and general system framework are summarized based on the previous research work for Chinese, Japanese, Korean, English, and recent research advancement for Tibetan, Uighur, Kazakh, Kirghiz, Arabic, and Mongolian. The key technologies include statistical character recognition, structural analysis for similar character discrimination, character segmentation, script identification, post-processing. Application of multilingual document recognition system in digital library and Web site content construction will benefit people using various languages to retrieve knowledge
What problem does this paper attempt to address?