A fast sparse graph based clustering technique using dispersion of data points

Mohammad Maksood Akhter,Abdul Atif Khan,Rashmi Maheshwari,R. Jothi,Sraban Kumar Mohanty
DOI: https://doi.org/10.1016/j.neucom.2024.129054
IF: 6
2024-12-06
Neurocomputing
Abstract:Minimum spanning tree (MST) have been employed in practice for various exploratory data analyses, e.g., to discover clusters of arbitrary shapes and sizes from diversified datasets. However, the computational complexity of these algorithms becomes a bottleneck when they are applied on very large datasets. The main overhead associated with these algorithms is the proximity search in the construction of a similarity graph which incurs O(N2) time on a set of N data points. To conquer this issue, several graph sparsification techniques have been proposed which take O(N3/2) time. This paper proposes a O(N4/3) time local neighborhood similarity graph construction technique using two levels of partitioning and merging, which is asymptotically O(N1/6) factor improvement over the existing methods. To the best of our knowledge, this is the asymptotically fastest known algorithm using two levels of partitioning and merging. Experimental analysis shows that the proposed sparse graph construction technique reduces 99.46% edges of the complete graph by preserving the relevant neighborhood information. Also, MST constructed from the proposed graph captures the local neighborhood of data points efficiently which is shown in terms of edge error and weight error rates. Finally, we demonstrate that the proposed approximate MST based clustering technique outperforms the best-known existing algorithms in terms of clustering accuracy and execution time on diversified datasets.
computer science, artificial intelligence
What problem does this paper attempt to address?