Improved VSM for Incremental Text Classification
Zhen Yang,Jianjun Lei,Jian Wang,Xing Zhang,Jun Guo
DOI: https://doi.org/10.1063/1.3037096
2008-01-01
AIP Conference Proceedings
Abstract:As a simple classification method VSM has been widely applied in text information processing field. There are some problems for traditional VSM to select a refined vector model representation, which can make a good tradeoff between complexity and performance, especially for incremental text mining. To solve these problems, in this paper, several improvements, such as VSM based on improved TF, TFIDF and BM25, are discussed. And then maximum mutual information feature selection is introduced to achieve a low dimension VSM with less complexity, and at the same time keep an acceptable precision. The experimental results of spam filtering and short messages classification shows that the algorithm can achieve higher precision than existing algorithms under same conditions.