An Adaptive Initial Cluster Centers Selection Algorithm for High-Dimensional Partition Clustering

Zhipeng Gao,Yidan Fan,Kun Niu,Ting Wang
DOI: https://doi.org/10.1109/dasc-picom-datacom-cyberscitec.2017.181
2017-01-01
Abstract:Cluster analysis is the process of partitioning a set of data objects into subsets, each subset is a cluster, so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. Partitioning methods in clustering start from an initial partitioning and gain the optimal partition by applying the iterative relocation technique. Partition clustering results depend heavily on the selection of initial cluster centers. Traditional distance-based initialization methods become inefficient because of the inherent sparsity in high-dimensional data and the curse of dimensionality, while existing improved methods are very sensitive to parameters. Based on these, we propose a new initialization method for high-dimensional partition clustering, which can choose high-density and low-similarity initial cluster centers and identify outliers according to its local structure in high-dimensional space adaptively. The experiments on both synthetic and real-world datasets show that the proposed algorithm can achieve better performance.
What problem does this paper attempt to address?