Smoothing Algorithm of the Task Adaptation Chinese N-gram Model

JIANG Minghu,ZHU Xiaoyan,YUAN Baozong
DOI: https://doi.org/10.3321/j.issn:1000-0054.1999.09.027
1999-01-01
Abstract:Statistic data sparse problem of Chinese word N gram model and changing of application domains caused former statistic model low recognition performance. A Chinese N gram model smoothing algorithm of task adaptation ability was put forward. A 0 gram to 3 gram forward and backwards probability statistics models were built in two application domains, it adopted the success experience of HMM in speech recognition, to apply Baum welch algorithm for optimum of the weights. Each weight stands for reliability of the correlation statistic models. The 5 gram statistic probability smoothing algorithm was obtained from the forward and backwards 3 gram, in order to offset the matrix sparse data of statistic probability. The “People Daily” corpus statistic is regard as the preliminary result, and “PC World” as the corpus of the changing domain to carry on successive training, a 3 gram model of task adaptation is gotten. The experiment results show, the 5 gram model is obtained from forward and backwards 3 gram models that has a higher grammar restriction with less shortage cost, thus the perplexity of statistic models is decreased greatly.
What problem does this paper attempt to address?