Comparative Study on Algorithms of Limited Corpus Language Model

Sun Shouan,Yang Genke,Yang Zuhua
DOI: https://doi.org/10.3969/j.issn.1007-757X.2010.12.006
2010-01-01
Abstract:In recent years,with the rapid development of science and technology and the widespread application of Internet,information increases dramatically.Training language model from corpus plays an important role in improving system performance For specific areas translation task,it is often plagued by the lack of relevant texts,fail to construction of large-scale corpus to train the language model,resulting in serious data sparse problem.This paper focuses on choosing smoothing algorithms under limited corpus language model.Through several comparative experiments,it can be concluded that Good-Turing method can leverage its low-frequency lexical revaluation advantage,and solve the problem caused by data sparse efficiently,and also improve the efficiency of language model under limited corpus.
What problem does this paper attempt to address?