Initialization Of K-Modes Clustering For Categorical Data

TaoYing Li,Chen Yan,Zhihong Jin,Li Ye
DOI: https://doi.org/10.1109/ICMSE.2013.6586269
2013-01-01
Abstract:The k-modes clustering algorithm is undoubtedly one of the most widely used partitional algorithms for categorical data. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initialization of clustering. Categorical initialization methods have been proposed to address this problem. In this paper, we present an overview of initialization methods of clustering for numerical data and categorical data respectively with an emphasis on their computational efficiency. We then propose a new initialization method for categorical data, which can obtain the good initial cluster centers using the new distance base on the RD, and explore the methods of density and grid. Finally, proposed method has been tested on diagnosis dataset, a real world data set from UCI Machine Learning Repository, and been analyzed the experimental results, which illustrates that the proposed method is effective and efficient for initializing categorical data.
What problem does this paper attempt to address?