Feature Selection for Text Clustering Based on the Genetic Algorithm

张锋,樊孝忠,许云
DOI: https://doi.org/10.3321/j.issn:1000-565X.2004.z1.030
2004-01-01
Abstract:As the traditional feature selection methods for text clustering cannot find the best feature set, the genetic algorithm is applied to the feature selection because it can get the global optimal solution and is of high searching efficiency. In this algorithm, a feature combination is regarded as a chromosome which is then performed with binary code, and the text set density is considered as the fitness function to evaluate the fitness of individual feature. By the operations of selection, crossover and mutation, the optimal feature set can rapidly be rapidly obtained. Experimental results on the open corpus show that the feature selection based on the genetic algorithm improves the text clustering precision by 5.9% and decreases the clustering time by 15 s.
What problem does this paper attempt to address?