Study on Feature Selection Methods in Chinese Text Categorization

Weihong Deng
2005-01-01
Abstract:This paper presents a study of seven feature selection methods that are commonly used in text categorization: document frequency, information gain, mutual information, X\+2 statistic, expected cross entropy, weight of evidence for text, and odds ratio. In order to evaluate these methods, experiments have been carried out combined with Chinese texts set in national 863 project and Rocchio classifier. The results of measured indicate that odds ratio method is superior to other methods.
What problem does this paper attempt to address?