K-Histograms: an Efficient Clustering Algorithm for Categorical Dataset

Zengyou He,Xiaofei Xu,Shengchun Deng,Bin Dong
DOI: https://doi.org/10.48550/arxiv.cs/0509033
2005-01-01
Abstract:Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-histogram, a new efficient algorithm for clustering categorical data. The k-histogram algorithm extends the k-means algorithm to categorical domain by replacing the means of clusters with histograms, and dynamically updates histograms in the clustering process. Experimental results on real datasets show that k-histogram algorithm can produce better clustering results than k-modes algorithm, the one related with our work most closely.
What problem does this paper attempt to address?