Clustering Large Datasets by Merging K-Means Solutions

Volodymyr Melnykov,Semhar Michael
DOI: https://doi.org/10.1007/s00357-019-09314-8
IF: 1.333
2019-03-29
Journal of Classification
Abstract:Existing clustering methods range from simple but very restrictive to complex but more flexible. The K-means algorithm is one of the most popular clustering procedures due to its computational speed and intuitive construction. Unfortunately, the application of K-means in its traditional form based on Euclidean distances is limited to cases with spherical clusters of approximately the same volume and spread of points. Recent developments in the area of merging mixture components for clustering show good promise. We propose a general framework for hierarchical merging based on pairwise overlap between components which can be readily applied in the context of the K-means algorithm to produce meaningful clusters. Such an approach preserves the main advantage of the K-means algorithm—its speed. The developed ideas are illustrated on examples, studied through simulations, and applied to the problem of digit recognition.
mathematics, interdisciplinary applications,psychology, mathematical
What problem does this paper attempt to address?