Ultra-DPC: Ultra-scalable and Index-Free Density Peak Clustering
Luyao Ma,Geping Yang,Xiang Chen,Yiyang Yang,Zhiguo Gong,Zhifeng Hao
DOI: https://doi.org/10.1007/978-981-97-2421-5_10
2024-01-01
Abstract:Density-based clustering is a fundamental and effective tool for recognizing connectivity structure. The density peak, the data object with the maximum density within a predefined sphere, plays a critical role. However, Density Peak Estimation (DPE), the process of identifying the nearest denser relation for each data object, is extremely expensive. The state-of-the-art accelerating solutions that utilize the index are still resource-consuming for large-scale data. In this work, we propose Ultra-DPC, an ultra-scalable and index-free Density Peak Clustering for Euclidean space, to address the challenges above. We theoretically study the correlation between two seemly different clustering algorithms: p-means and density-based clustering, and provide a novel p-means density estimator. Based on this, first, p-means is used on a set of samples S to find a set of p Local Density Peaks (LDP), where p << N, and N is the number of data objects. Second, so as an informative LDP-wise affinity graph is conducted, and then it is enriched by a Random Walk process to incorporate the clues from the non-LDP objects. Third, the importance of LDP is estimated and the most important ones are chosen as the seeds. Finally, the class memberships of the remaining objects are determined according to their relations to the LDP. UltraDPC is the fastest DPE method but without reducing the quality of clustering. The evaluation of different medium- and large-scale datasets demonstrates both the efficiency and effectiveness of Ultra-DPC over the state-of-the-art densitybased methods.