A Mixed Data Clustering Algorithm with Noise-Filtered Distribution Centroid and Iterative Weight Adjustment Strategy.

Xiangjun Li,Zijie Wu,Zhibin Zhao,Feng Ding,Daojing He
DOI: https://doi.org/10.1016/j.ins.2021.07.039
IF: 8.1
2021-01-01
Information Sciences
Abstract:Clustering is an important technology for data analysis. Cluster analysis for mixed data remains challenging. This paper proposes a mixed data clustering algorithm with noise filtered distribution centroid and iterative weight adjustment strategy. The proposed algorithm defines noise-filtered distribution centroid for categorical attributes. We combine both mean and noise-filtered distribution centroid to represent the cluster center with mixed attributes, the noise-filtered distribution centroid records the frequency of occurrences for each possible value of the categorical attributes in a cluster more accurately. Furthermore, because the "noise values" are filtered, the measure to calculate the dissimilarity between data objects and cluster centers could be improved. In addition, the algorithm introduces an iterative weight adjustment strategy with combined intra-cluster and inter-cluster information. The unified weight measurement method is used for refining numeric attributes and categorical attributes. Then attributes with higher intra-cluster homogeneity and inter-clusters heterogeneity are considered as attributes with higher priority. They tend to be assigned with relatively heavier weights during clustering. Experimental results on different datasets from the UCI repository show that the MCFCIW algorithm outperforms the existing partition-based clustering algorithm and clustering algorithm based on data conversion for mixed data on both cluster validity indices and convergence speed. (c) 2021 Elsevier Inc. All rights reserved.
What problem does this paper attempt to address?