An Improved Partitioning-Based Web Documents Clustering Method Combining GA with ISODATA

Zhengyu Zhu,Yunyan Tian,Jingqiu Xu,Xin Deng,Xiang Ren
DOI: https://doi.org/10.1109/FSKD.2007.165
2007-01-01
Abstract:The existing partitioning-based clustering algorithms, such as k-means, k-medoids and their variations, are simple in theory and fast in convergence speed, but they always just reach local optimum when the iterations terminate and they are not suitable for discovering clusters in the cases when their sizes are very different. This paper proposes an improved Web documents clustering method, using genetic algorithm (GA) which introduces some ideas of ISODATA [6] into the design of its mutation operation. Experiments show that the GA's global search characteristic can avoid local optimum and the ISODATA-based mutation operation makes the improved clustering algorithm have the self-adjusting ability to discover clusters of different sizes.
What problem does this paper attempt to address?