Text Classification Based on N-gram Language Model

周新栋,王挺
2005-01-01
Journal of Computer Applications
Abstract:Text classification has become a research focus in the field of natural language processing. After the review of traditional text classification models, a method using N-gram language models to classify Chinese text was presented. This model doesn′t present documents with bag of words, but regards documents as random observation sequences. With the bi-gram model, a text classifier based on word level was implemented. The performance of the N-gram model classifier was compared with that of the traditional models (Vector Space Model and Naive Bayes Model). Experiment result shows that the accuracy and the stability of the N-gram model classifier are better than others.
What problem does this paper attempt to address?