Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm

Qi He,Zhenxiang Chen,Ke Ji,Lin Wang,Kun Ma,Chuan Zhao,Yuliang Shi
DOI: https://doi.org/10.1007/978-3-030-16657-1_49
2018-01-01
Abstract:K-means algorithm, the most classic partition-based clustering method, has its disadvantages. If there are outliers in the data sets, the K-means algorithm may lead to serious deviation of the mean value. In addition, random initialization is very sensitive to the input data parameters. In this paper, we propose initialization and outlier detection based on distance and density for the K-means algorithm (KMIDDO), an improvement method to optimize the initial center points, especially it has more effective in the case of outliers. What’s more, we extend an outlier detection method to improve the clustering effect. We hope the distance between every two center points is as far as possible and the density of the center points are as large as they can. In terms of initialization, we calculate the distance and density of points. In the outliers detection, we take the outliers as a single class based on the distance and density. Experiments are conducted to illustrate the effectiveness and accuracy of the proposed algorithms on several synthetic and real datasets.
What problem does this paper attempt to address?