Improved katz smoothing for language modeling in speech recogniton

Genqing Wu,Fang Zheng,Wenhu Wu,Mingxing Xu,Ling Jin
DOI: https://doi.org/10.21437/icslp.2002-309
2002-01-01
Abstract:In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the occurring counts of the n-gram units but also the low-order history frequencies. This modification makes the smoothing more reasonable for those n-gram units that have homophonic (same in pronunciation) histories. The new method is tested on a Chinese Pinyin-to-character (where Pinyin is the pronunciation string) conversion system and the results show that the improved method can achieve a surprising reduction both in perplexity and Chinese character error rate.
What problem does this paper attempt to address?