Abstract:The k-means clustering algorithm is popular but has the following main drawbacks: 1) the number of clusters, k, needs to be provided by the user in advance, 2) it can easily reach local minima with randomly selected initial centers, 3) it is sensitive to outliers, and 4) it can only deal with well separated hyperspherical clusters. In this paper, we propose a Local Density Peaks Searching (LDPS) initialization framework to address these issues. The LDPS framework includes two basic components: one of them is the local density that characterizes the density distribution of a data set, and the other is the local distinctiveness index (LDI) which we introduce to characterize how distinctive a data point is compared with its neighbors. Based on these two components, we search for the local density peaks which are characterized with high local densities and high LDIs to deal with 1) and 2). Moreover, we detect outliers characterized with low local densities but high LDIs, and exclude them out before clustering begins. Finally, we apply the LDPS initialization framework to k-medoids, which is a variant of k-means and chooses data samples as centers, with diverse similarity measures other than the Euclidean distance to fix the last drawback of k-means. Combining the LDPS initialization framework with k-means and k-medoids, we obtain two novel clustering methods called LDPS-means and LDPS-medoids, respectively. Experiments on synthetic data sets verify the effectiveness of the proposed methods, especially when the ground truth of the cluster number k is large. Further, experiments on several real world data sets, Handwritten Pendigits, Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give a superior performance than the analogous approaches on both estimating k and unsupervised object categorization.

A robust algorithm for cluster initialization using uniform effect of k-Means

Algorithm for Initialization of K-Means Clustering Center Based on Optimized-Division

Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm

Adaptive Initialization Method for K-means Algorithm

Effective Deterministic Initialization for $K$-Means-like Methods Via Local Density Peaks Searching.

A Hierarchical-Based Initialization Method for K-Means Algorithm

Careful Seeding for k-Medois Clustering with Incremental k-Means++ Initialization

New Method for the Initialization of Clusters Based on Sata Distribution

Stable Initialization Scheme for K-means Clustering

New Initialization Method for Cluster Center

An Initialization Method Of K-Means Clustering Algorithm For Mixed Data

Research on Heuristic Initialization-Independent K-Means Algorithm

Research on Initialization of K-means Type Multi-View Clustering*

Outlier Factor Based Partitional Clustering Analysis with Constraints Discovery and Representative Objects Generation

A survey on the initialization methods for the k-means algorithm

Initializing K-means Clustering Using Affinity Propagation

The Optimal Initial Centers Clustering Algorithm Based on Local Outlier Factor

Adaptive Initialization Method Based on Spatial Local Information for K-Means Algorithm

Nonuniform Sparse Data Clustering Cascade Algorithm Based on Dynamic Cumulative Entropy

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

Improved Outlier Robust Seeding for k-means