A Chinese Text Classifier Based on N-Gram Language Model and Chain Augmented Na(i)ve Bayesian Classifier

MAO Wei,XU Wei-ran,GUO Jun
DOI: https://doi.org/10.3969/j.issn.1003-0077.2006.03.005
2006-01-01
Abstract:An automatic Chinese text categorization method based on n-gram language model and chain augmented na?ve Bayesian classifier is proposed.The paper introduces the representation of a text through n-gram language model,argues the advantage of combining n-gram language model and chain augmented na?ve Bayesian classifier,analyzes how to choose the parameters of n-gram language model,and discusses some crucial problems of the categorization system.The effect of quantity and quality of training corpus on classifier performance is also studied experimentally.The categorization system is tested on the 863-project data set for Chinese text categorization.The experimental result shows that the system performs well.
What problem does this paper attempt to address?