A Survey On Dimension Reduction Techniques In Text Classification

Zhi Juan Wang,Ruo Song Zhou
2015-01-01
Abstract:Dimension reduction is one of the key points for text classification. Feature selection and feature extraction are the two common methods of dimension reduction. In this paper, we mainly discussed some dimension reduction techniques from two aspects including traditional methods (Information Gain, Mutual Information, Document Frequency, Correlation Coefficient) and new methods (Optimization Mutual Information Based on Word Frequency, CDF (Concentration, Dispersion and Frequency), Semantic Relatedness). Then analyzed the principle of these methods and illustrated their advantages as well as disadvantages.
What problem does this paper attempt to address?