New Post-Processing Method Based on Noisy Channel Model for Chinese Character Recognition

Yuanxiang Li,Xiaoqing Ding,Changsong LIU
DOI: https://doi.org/10.3321/j.issn:1000-0054.2001.01.007
2001-01-01
Abstract:In Chinese document recognition incorporating post-processing, the document recogn ition rate is limited if the candidate sets do not cantain any correct characters. The noisy channel model is used to develop a method for expanding the candid ate sets. The method uses the original candidates given by the recognizer to con jecture the most likely correct characters and then combines them with the origi nal candidates to produce new candidate sets. In a test with 300 off-line handw ritten samples, the top 50 candidates of the new candidate sets achieved 37.88% average error reduction rate in comparison with the original candidate sets. Usi ng the character-based bigram language model, and after expanding the candidate sets using the method proposed here, the average recognition rate for off-line handwritten Chinese documents (about 80,000 characters) is 95.82%, compared wit h the average recognition rate of 93.93% without candidates sets expansion. On a verage, a 31.14% error reduction rate is achieved.
What problem does this paper attempt to address?