Estimating the Minimum Entropy of Chinese and Japanese Languages.

FJ Ren,K Yen
DOI: https://doi.org/10.1142/s0219622005001702
2005-01-01
International Journal of Information Technology & Decision Making
Abstract:The study of minimum entropy of a natural language has been an interesting research subject. For English, great progress has been made, but few reports on other languages have been found in literature. Based on two hypotheses on the conservation of information quantity, we proposed a method which can be used to estimate the minimum entropy of characters in natural languages. With a large quantity of translation corpus, this method enables us to estimate the minimum entropy without calculating the probability. Besides, as the scale of translation corpus increases, the fluctuation of the ratio between character quantities in any two languages becomes negligible. In this paper, we apply this method to the study of two languages of a large character total — Japanese and Chinese.
What problem does this paper attempt to address?