Density Ratio Peak Clustering
Shuliang Wang,Xiaojia Liu,Qi Li,Hanning Yuan,Ye Yuan,Ziwen Feng,Fan Zhang
DOI: https://doi.org/10.1007/978-981-97-2421-5_31
2024-01-01
Abstract:Clustering is an important means of obtaining hidden information, and is widely used in economics, biomedicine and other disciplines. Data imbalance widely exists in real-world datasets. For example, when fraud detection is performs in transaction data, only a very small amount of transaction data has fraudulent behavior. Therefore clustering on density-imbalanced datasets has practical implications. Various clustering algorithms have been proposed in recent years, but most clustering algorithms cannot correctly identify low-density clusters on density-imbalanced datasets, resulting in clustering failure. To this end, we propose a density ratio peak clustering (DRPC) algorithm, which solves the problem that the original density peak clustering (DPC) algorithm cannot correctly identify low-density clusters and non-center points allocation error linkage problem on density-imbalanced datasets. We conduct experiments on shape datasets, density-imbalanced datasets, and UCI real-world datasets, using normalized mutual information NMI as the evaluation metric, comparing with SNN-DPC, DPC-KNN, DPC, DBSCAN, K-Means algorithms. Experiment results show that DRPC not only inherits the advantages of DPC, but also can more accurately cluster density-imbalanced datasets, and the NMI of the clustering results has increased by 1.5% on average.