A New Feature Selection Method for Text Clustering

Xu Junling,Xu Baowen,Zhang Weifeng,Cui Zifeng,Zhang Wei
DOI: https://doi.org/10.1007/s11859-007-0040-x
2007-01-01
Abstract:Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin’s index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin’s index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.
What problem does this paper attempt to address?