Influence of Language Models and Candidate Set Size on Contextual Post-processing for Chinese Script Recognition.

YX Li,CL Tan
DOI: https://doi.org/10.1109/icpr.2004.1334295
2004-01-01
Abstract:In the Chinese language, a word consisting of one or more characters is a basic syntax-meaningful unit, however, each character in the word also has a definite meaning in itself. We compare the perplexities of four n-gram language models (character-based bigram, character-based trigram, word-based bigram and class-based bigram) and their influence on the performance of contextual post-processing of Chinese scripts in an offline handwritten Chinese character recognition system. We also demonstrate the influence of the candidate set size on the performance of contextual post-processing in detail, and indicate that the number of candidates should vary with each script.
What problem does this paper attempt to address?