Entropy of Chinese and the perplexity of the language models

Jun Wu,Zuoying Wang
1996-01-01
Tien Tzu Hsueh Pao/Acta Electronica Sinica
Abstract:A method of estimating an upper bound of the entropy of printed Chinese is presented. A bound of 5.17 bits/character for the entropy is obtained by computing the entropy of the sample of Chinese corpus. The perplexity of several language models, which is a quantitative measurement for the ability of language models, is discussed. A new method of approximating high scale language model by the lower ones is also presented.
What problem does this paper attempt to address?