A COMPRESSION METHOD USED IN LANGUAGE MODELING FOR HANDHELD DEVICES

Genqing Wu,Fang Zheng,Wenhu Wu
2002-01-01
Abstract:In this paper, a new n-gram language model compression method is proposed for applications in handheld devices, such as mobiles, PDAs, and handheld PCs. Compared with the traditional methods, the use of the proposed method can compress the model to a great extent with good performance preserved. The proposed method includes three aspects. The language model parameters are detailedly analyzed and a criterion based on the probability and the importance of n-grams is used to determine which n-grams should be kept and which be removed. A curving compressing function is proposed to be used to compress the ngram count values in the full language model. And a code table is extracted and used to estimate the probabilities of bi-grams. Our experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease. This makes the language model usable in handheld devices.
What problem does this paper attempt to address?