MD-SPKM: A Set Pair K-Modes Clustering Algorithm for Incomplete Categorical Matrix Data

Chunying Zhang,Ruiyan Gao,Jiahao Wang,Song Chen,Fengchun Liu,Jing Ren,Xiaoze Feng
DOI: https://doi.org/10.3233/ida-205340
IF: 1.7
2021-01-01
Intelligent Data Analysis
Abstract:In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.
What problem does this paper attempt to address?