K*-Means: An Efficient Clustering Algorithm with Adaptive Decision Boundaries

Jianwu Long,Luping Liu
DOI: https://doi.org/10.1007/s10766-024-00779-8
2024-11-11
International Journal of Parallel Programming
Abstract:Conventional k -means algorithms often face significant computational burdens and have a high dependence on the number of predefined clusters k . Therefore, this paper proposes the -means algorithm, which incorporates the concept of the perceptron classification algorithm to transform the distance-based clustering task into a classification problem, significantly improving clustering efficiency. Moreover, this paper combines the -means algorithm with hierarchical clustering methods that can automatically identify the number of clusters. An initial clustering is performed using a large pre-set number of clusters with the -means algorithm, followed by merging the sub-clusters through hierarchical clustering. Experimental results show that the proposed -means method has significant advantages when handling large-scale datasets. It greatly reduces the number of distance calculations and performs better in terms of runtime compared to the latest accelerated k -means algorithms. And the -means algorithm, when combined with hierarchical clustering, shows notable performance on both the four synthetic datasets and the four real datasets. Future work could explore leveraging parallelization techniques to further enhance its scalability and efficiency on even larger datasets.
computer science, theory & methods
What problem does this paper attempt to address?