Research on the Methods of Chinese Text Classification Using Bayes and Language Model

Tao Yan,Guang-Lai Gao
DOI: https://doi.org/10.1109/ccpr.2008.88
2008-01-01
Abstract:With the increase of information on Internet, how to gain useful information fleetly and effectively becomes an important task, and information automatic classification emerges as the times require. Bayes has been used in many fields as one of the classification methods. This paper applies the classification model which Bayes classifier combines with language model to Chinese text classification. On the Chinese Corpus of FuDan University, our experiments show that the improved classifiers which used the four smoothing methods have better performance than naive Bayes classifier model. In particular with the method Jelinek-Mercer of adopting modified smoothing scale, the performance of classifier improves a lot.
What problem does this paper attempt to address?