Robust Subcluster Search and Mergence Clustering

Bocheng Wang,Mulin Chen,Xuelong Li
DOI: https://doi.org/10.1109/TCYB.2024.3446764
2024-09-04
Abstract:In recent years, graph-based clustering presents outstanding performance and has been widely investigated. It segments the data similarity graph into multiple subgraphs as final clusters. Many methods integrate graph learning and segmentation into a unified optimization problem to explore the graph structure. However, existing research 1) attempts to derive the final clusters from the learned graph directly, which relies on a highly tight internal distribution within each cluster, and is too strict for the real-world data; 2) generally constructs a holistic full sample graph, which means the outliers are involved in graph learning explicitly, and may corrupt the graph quality. To overcome the above limitations, a new clustering model called robust subcluster search and mergence (RSSM) is established in this article. Inspired by the positive-incentive noise (Pi-Noise), RSSM assumes that the outliers are useful for learning the data structure. Considering a few samples with large errors as outliers, RSSM finds the subcentroids by searching an imbalanced residue distribution. In this way, the subcentroids pull the normal samples together and push the outliers far away. Compared with the traditional clusters, the subclusters indicated by the subcentroids are more explicit, where the normal samples are tightly connected. After that, a subcluster similarity graph is constructed to guide the mergence of subclusters. To sum up, RSSM performs the search and mergence of subclusters simultaneously with the help of outliers, and generates a graph that is more suitable for clustering. Experiments on several datasets demonstrate the rationality and superiority of RSSM.
What problem does this paper attempt to address?