A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering

D. Mustafi,G. Sahoo
DOI: https://doi.org/10.1007/s00500-018-3289-4
IF: 3.732
2018-06-07
Soft Computing
Abstract:In this paper, we propose a heuristic-based algorithm to improve the initial seeding of the k-means clustering algorithm. The proposed algorithm primarily aims to improve the initial choice of the centroids used by the k-means algorithm and also ensure that the requisite number of clusters is always returned in every run of the algorithm. Thus, the use of the proposed algorithm significantly reduces the possibility of k-means converging to a locally optimal solution. The paper explores the genetic algorithm framework to obtain the original seed points and couples this with the use of the differential evolution heuristic to obtain the requisite number of clusters. We have examined the performance of the proposed algorithm in the case of clustering text documents as such corpus often have significantly large number of data points and also require the formation of a large number of clusters. The results obtained have been compared with basic implementations of the k-means algorithm using standard parameters.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?