A Comparative Study on Feature Selection in Chinese Text Classification Problem

Hu Li,Peng Zou,Weihong Han
DOI: https://doi.org/10.1109/nces.2012.6544065
2013-01-01
Applied Mechanics and Materials
Abstract:Information explosion brings lots of challenges to text classification. The dimension disaster led to a sharp increase of computational complexity and lower classification accuracy. Therefore, it is critical to use feature selection techniques before actual classification. Automatic classification of English text has been researched for many years, but little on Chinese text. In this paper, several classic feature selection methods, namely TF, IG and CHI, are compared on classifying Chinese text. Meanwhile, we take imbalanced data into consideration in the paper. Experimental results show that CHI performed better than IG and TF when the dataset is imbalanced, but no obvious difference on balanced data.
What problem does this paper attempt to address?