Application of Genetic Algorithm in Document Clustering

Wei Jian-Xiang,Liu Huai,Sun Yue-hong,Su Xin-Ning
DOI: https://doi.org/10.1109/ITCS.2009.269
2009-01-01
Abstract:By researching all kinds of methods for document clustering, we put forward a new dynamic method based on genetic algorithm (GA). K-means is a greedy algorithm, which is sensitive to the choice of cluster center and very easily results in local optimization. Genetic algorithm is a global convergence algorithm, which can find the best cluster centers easily. Among the traditional document clustering methods, the document similar matrix is a sparse matrix. In this paper, we propose some new formulas improved on the traditional method. Then, we make some improvement on genetic algorithm. All individuals are encoded byfloating-point number and the sum of mean square deviation of intra-class distance is adopted as the objective function. The steps of the algorithm are given in detail. The experimental results show that the accuracy of GA can reach over 98 percent andgenerate better clustering result than K-means.
What problem does this paper attempt to address?