Study on Information Gain-Based Feature Selection in Chinese Text Categorization

Xiaoxia Liu
2012-01-01
Computer Engineering and Applications Journal
Abstract:The feature selection method of traditional Information Gain(IG)ignoring the shortcoming of distributing information inside class and between classes is analysed.Distribution information inside class and concentration information between classes are introduced,which is used to distinguish characteristics of strong correlation with class.Considering the problem of the feature selection method of traditional Information Gain(IG)not well combining positive feature and negative feature,the ratio of positive feature and negative feature is introduced with proportional factor to balance the effect of feature appear and disappear,which decreases the effect of negative feature on the corpus of category uneven distribution and increases classification effect.The experimental results verify the efficiency and probability of the improved IG approach.
What problem does this paper attempt to address?