A Feature Selection Method Based on Class Feature Domains for Text Categorization

Shi-qi ZHAO,Yu ZHANG,Ting LIU,Yi-heng CHEN,Yong-guang HUANG,Sheng LI
DOI: https://doi.org/10.3969/j.issn.1003-0077.2005.06.004
2005-01-01
Abstract:Feature selection is one of the key problems in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents a novel feature selection method which is based on class feature domains. First, we will make use of the combined feature selection method~([1]) to remove noisy features from the original feature space and extract candidate features. That is, we'll take off low frequency words using Document Frequency method firstly and then select candidate features using Mutual Information method. Then, we will construct a class feature domain for each class and conquer the sparseness of trainning datas by merging and strengthening the candidate features which appear in the class feature domains. Experiments show that our method is much better than kinds of traditional feature selection methods and it can improve the performance of text categorization systems markedly.
What problem does this paper attempt to address?