A robust algorithm for cluster initialization using uniform effect of k-Means

彭柳青,张军英,许进
DOI: https://doi.org/10.13245/j.hust.2010.08.024
2010-01-01
Abstract:On the basis of k-Means clustering's uniform effect, a new robust clustering initialization algorithm is proposed to improve the clustering quality of an outlier-contaminated dataset. The uniform effect of k-Means can assure certain relationships between clusters that, clusters lying in any sparse sample all have big sizes, clusters lying in any dense area are all of small sizes, and neighbor clusters in dense area have comparable sizes. The algorithm first partition a dataset using k-Means method with an excessive cluster number, to easily obtain the above size relationships between clusters. Then, by merging those small-size clusters lying in the neighborhood, the algorithm obtains dense sample areas in the data space, which can be set as initial clusters. Outliers, however, distribute very sparsely, most of which are clustered into big-size clusters, and thus they affect the initialization process very little. Theoretic analysis and various experiments show the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?