Granular-ball computing-based manifold clustering algorithms for ultra-scalable data

Dongdong Cheng,Shushu Liu,Shuyin Xia,Guoyin Wang
DOI: https://doi.org/10.1016/j.eswa.2024.123313
IF: 8.5
2024-01-31
Expert Systems with Applications
Abstract:Manifold learning is essential for analyzing high-dimensional data, but it suffers from high time complexity. To address this, researchers proposed using anchors and constructing a similarity matrix to expedite eigen decomposition and reduce sparse consumption. However, randomly selected anchors fail to represent the data well, and using K-means for anchor generation is time-consuming. In this paper, we introduce Granular-ball (GB) into unsupervised manifold learning, presenting GB-USC and GB-USEC. By employing a coarse-to-fine approach, GB-USC generates high-quality anchors aligned with the data distribution. A bipartite graph is constructed between data points and anchors, enabling low-dimensional manifold embedding using transfer cut. GB-USEC combines multiple GB-USC clusters, generating consistent low-dimensional embeddings across dimensions and determining clustering results through voting. The experimental results show that compared with the state-of-the-art algorithm U-SPEC, GB-USC achieves the similar performance with the average running time of GB-USC is 33.96% less than that of U-SPEC for several million-level datasets. Additionally, our ensemble algorithm improves the clustering efficiency by an average of 29.19% compared with U-SENC.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?