Density peak clustering by local centers and improved connectivity kernel

Wenjie Guo,Wei Chen,Xinggao Liu
DOI: https://doi.org/10.1016/j.ins.2024.120439
IF: 8.1
2024-03-08
Information Sciences
Abstract:Similarity calculation is one of the most critical steps of clustering analysis, especially for arbitrarily formed elongated structures. When it comes to Density Peak Clustering (DPC), using Euclidean distance solely to calculate the similarity also makes it suffer arbitrarily formed data clustering. To tackle this deficiency of DPC, an improved Connectivity Kernel (ICK) was presented to accelerate Connectivity Kernel and help DPC identify clusters with arbitrarily formed structures, which mainly consist of two strategies: (i) Because that Connectivity Kernel suffers from outliers between two clusters if their density is as high as the backbone of the clusters, ICK firstly extracts local centers according to local density and relative location of points, which can eliminate most outliers and boundary points without breaking the original distribution of data. Thus, not only the adverse impact of outliers can be avoided, many meaningless calculations time can also be saved; (ii) ICK defines the connection between two local centers as their dissimilarity according to Connectivity Kernel. Differently, instead of traversing the entire dataset, ICK only focus on several specific path between two local centers to evaluate their connectivity, which further reduces the computational complexity of the algorithm to O(nlogn) . Experiments on synthetic and real-world datasets demonstrate the effectiveness and robustness of the proposed algorithm in practical application.
computer science, information systems
What problem does this paper attempt to address?