An Initialization Method Of K-Means Clustering Algorithm For Mixed Data

Taoying Li,Zhihong Jin,Yan Chen,Angelo Dan Menga Ebonzo
2014-01-01
Abstract:The k-means clustering algorithm is undoubtedly the most widely used partitional algorithms. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initialization of clustering. Initialization methods have been proposed to address this problem. In this paper, we present an overview of initialization methods of clustering for numerical data and categorical data respectively with an emphasis on their computational efficiency. We then propose a new initialization method for mixed data, which can obtain the good initial cluster centers using the MaxAvg distance, and give the effective k-means clustering for mixed data. Finally, the proposed method is verified on three different real world datasets from UCI Machine Learning Repository, and it is shown that the proposed method is effective and efficient for initializing and partitioning mixed data.
What problem does this paper attempt to address?