Web Knowledge Base Improved Ocr Correction For Chinese Business Cards

Xiaoping Wang,Yanghua Xiao,Wei Wang
DOI: https://doi.org/10.1007/978-3-319-21042-1_65
2015-01-01
Abstract:In the field of Optical Character Recognition(OCR), improving the recognition accuracy has been extensively studied in the past decades. In this paper, different from previously published model-based correction methods, Knowledge Base was applied to OCR correcting system from the perspective of linked knowledge. A pipelined method integrating selectivity-aware pre-filtering, text-level and image-level comparison was explored to identify the best candidate with better efficiency and accuracy. For more reliable comparison of company, the weighted coefficients derived from Wikipedia were applied to distinguish the different importance. Moreover, traditional Levenshtein distance was generalized to Image-based Levenshtein measure to better distinguish strings with similar text similarity. The experimental results demonstrated that the proposed system could perform more effectively than the baseline case.
What problem does this paper attempt to address?