K and starting means for k-means algorithm

Ahmed Fahim
DOI: https://doi.org/10.1016/j.jocs.2021.101445
IF: 3.817
2021-10-01
Journal of Computational Science
Abstract:The k-means method aims to divide a set of N objects into k clusters, where each cluster is represented by the mean value of its objects. This algorithm is simple and converges to local minima quickly. It has linear time complexity, but it requires the number of clusters in advance which requires some knowledge in advance, in addition to selecting the initial centers which affect the quality of the final result and the number of iterations. The quality of the final result and the number of iterations depend on both k and initial centers. Many papers tried to detect a suitable value for k (the number of clusters) or introduced a better method for selecting the initial centers only. This research introduces a method able to detect a near-optimal value for k and near-optimal initial centers. The proposed method adds a preprocessing step to get the number of clusters and the initial centers before applying the k-means method. The idea is to get initial clusters using a density-based method that does not require the number of clusters in advance and computes the mean values for objects in each cluster and uses this knowledge in k-means. This leads to improving the quality of the final result as presented in the experimental results. The proposed method will use the DBSCAN "Density-based spatial clustering of application with noise" method as a preprocessing step. So, the paper concentrates on the DBSCAN and k-means. The proposed method will converge to global minima which improve the quality of the final result. The proposed method requires the two input parameters for the DBSCAN method and its time complexity is o(n log n) which is the same as that of DBSCAN.
computer science, theory & methods, interdisciplinary applications
What problem does this paper attempt to address?