Fast Estimation for the Number of Clusters.

Xiaohong Zhang,Zhenzhen He,Zongpu Jia,Jianji Ren
DOI: https://doi.org/10.1007/978-3-030-63941-9_27
2020-01-01
Abstract:Clustering analysis has been widely used in many areas. In many cases, the number of clusters is required to been assigned artificially, while inappropriate assignments affect analysis negatively. Many solutions have been proposed to estimate the optimal number of clusters. However, the accuracy of those solutions drop severely on overlapping data sets. To handle the accuracy problem, we propose a fast estimation solution based on the cluster centers selected in a static way. In the solution, each data point is assigned with one score calculated according to a density-distance model. The score of each data point does not change any more once it is generated. The solution takes the top k data points with the highest scores as the centers of k clusters. It utilizes the significant change of the minimal distance between cluster centers to identify the optimal number of the clusters in overlapping data sets. The experiment results verify the usefulness and effectiveness of our solution.
What problem does this paper attempt to address?