Improved Katz smoothing algorithms with POS information

Yan ZHAO,Xiao-long WANG,Zhi-ming XU,Bing-quan LIU
DOI: https://doi.org/10.3321/j.issn:0367-6234.2007.09.024
2007-01-01
Abstract:This paper reviewed existing smoothing methods for N-gram model firstly, and implemented the Absolute, W-B and Katz smoothing algorithms respectively. Traditional Katz algorithm couldn't discount the probability when it smoothed Chinese collocation. We constructed new discounting coefficient based on Part-of-Speech information to resolve this problem. Calculated by the new discounting coefficient, discount could decrease when word frequency increased, and the more count of following word, the more discount. All this satisfied demand of smoothing methods. Experiment result showed that improved Katz smoothing algorithm could not only decrease the cross entropy of language model, but also increase the F measure when applied to Chinese word segmentation.
What problem does this paper attempt to address?