Error feedback based lexical entity extraction for Chinese language modeling

Yi Liu,Jing Hua,Xiangang Li,Xihong Wu
DOI: https://doi.org/10.1109/CISP.2013.6743873
2013-01-01
Abstract:Chinese, which is quite different from western languages, has no standard definition of word. Therefore, choosing suitable lexicon plays an important role in Chinese language modeling. This paper proposes a novel method of constructing the lexicon automatically. Other than depending on statistical measures of text features, this method is directly based on the feedback of errors from the corresponding task, such as phoneme-to-grapheme conversion in this paper. The whole process consists of two iterative phases: selection of individual words from a large manual lexicon and further extraction of compound words based on Phase One. Experiments implemented on phoneme-to-grapheme conversion show that this method can achieve 1.09% and 0.38% absolute reduction in character error rate respectively for Phase One and Phase Two compared with baseline lexicons in the same size generated by the conventional method based on word frequency.
What problem does this paper attempt to address?