A k-mean clustering algorithm for mixed numeric and categorical data

Amir Ahmad,Lipika Dey
DOI: https://doi.org/10.1016/j.datak.2007.03.016
2007-11-01
Abstract:Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?