Weighted Rough Clustering on Categorical Data

Jian Fu,Jian Yin
DOI: https://doi.org/10.1109/csss.2011.5972099
2011-01-01
Abstract:Clustering is an unsupervised machine learning framework which is attracted much attention recently. Current clustering algorithms mainly focus on samples with real-value attributes, while there is little work on samples represented (partly) by categorical attributes. The difficulty of processing categorical attributes is that the similarity between such samples can't be evaluated by Euclidean distance directly, as much real-value based methods do. We try to tackle this problem by adopting rough set theory. Rough similarity is used to define similarity between samples. Each attribute is assigned a weight to indicate its importance for clustering and an adaptive update process based on information gain is performed to find optimal solution of both weights and clusters. The benefit of the proposed method is: it can deal with categorical data naturally; it is not sensitive to input sequence of samples to be clustered; it optimizes both importance of attributes and number of clusters simultaneously. Experiments on UCI benchmark data set show the effectiveness with comparison to some previous famous methods.
What problem does this paper attempt to address?