Optimized Approach of Feature Selection Based on Information Gain

Guohua Wu,Junjun Xu
DOI: https://doi.org/10.1109/csma.2015.38
2015-01-01
Abstract:Text feature selection is the key technology in text classification and text information retrieval. The feature selection method - information gain - has extensive application in text categorization. This paper theoretically analyzed the deficiency of information gain in feature selection methods, and then introduced two improvement factors which were LDFWF (Limiting Document Frequency's Word Frequency) and DI (Distribution Information), on this basis an improved text feature selection method was proposed. In this paper, the experiments used the SVM classifier for text classification, text feature selection methods respectively used information gain and the improved information gain that this paper proposed, the results show that the method effectively improve the accuracy of text classification.
What problem does this paper attempt to address?