An OCR post-processing approach based on multi-knowledge

Li Zhuang,Xiaoyan Zhu
DOI: https://doi.org/10.1007/11552413_50
2005-01-01
Abstract:This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In this approach, statistical language model and semantic lexicon are combined, and candidate distance information is used to reduce the size of the search space. The experimental results show that this approach is very effective. After post-processing, the recognition accuracy rate on the test set increases from 58.45% to 83.73%, which means 60.84% error reduction.
What problem does this paper attempt to address?