Lexicon Optimization for Chinese Language Modeling

Jun Zhao,Jianfeng Gao,Eric Chang,Mingjing Li
2000-01-01
Abstract:In this paper, we present an approach to lexicon optimization for Chinese language modeling. The method is an iterative procedure consisting of two phases, namely lexicon generation and lexicon pruning. In the first phase, we extract appropriate new words from a very large training corpus by statistical approaches. In the second phase, we prune the lexicon to a preset memory limitation using a perplexity minimization criterion. Experimental results show up to 6% character perplexity reduction comparing to the baseline lexicon.
What problem does this paper attempt to address?