Comparison and Improvement of Feature Selection Method for Text Categorization

Chengjie Sun
2011-01-01
Abstract:Feature selection is highly relative to the performance of text categorization systems.In this paper,we measured the effects of several popular feature selection methods for increasing the performance of a text categorization oriented to tourism domain.Out of five methods,three methods with better performance were chosen.They are expected cross entropy,information gain and mutual information.Through theoretical analysis and experiments,we modified the three methods respectively.Experimental results revealed that the modified expected cross entropy method yielded better performance than the others in our application.
What problem does this paper attempt to address?