Self-Supervised Graph Embedding Clustering

Fangfang Li,Quanxue Gao,Cheng Deng,Wei Xia
2024-10-30
Abstract:The K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. However, it combines the K-means clustering and dimensionality reduction processes for optimization, leading to limitations in the clustering effect due to the introduced hyperparameters and the initialization of clustering centers. Moreover, maintaining class balance during clustering remains challenging. To overcome these issues, we propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework. Specifically, we establish a connection between K-means and the manifold structure, allowing us to perform K-means without explicitly defining centroids. Additionally, we use this centroid-free K-means to generate labels in low-dimensional space and subsequently utilize the label information to determine the similarity between samples. This approach ensures consistency between the manifold structure and the labels. Our model effectively achieves one-step clustering without the need for redundant balancing hyperparameters. Notably, we have discovered that maximizing the $\ell_{2,1}$-norm naturally maintains class balance during clustering, a result that we have theoretically proven. Finally, experiments on multiple datasets demonstrate that the clustering results of Our-LPP and Our-MFA exhibit excellent and reliable performance.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the "curse of dimensionality" problem faced when clustering in high - dimensional spaces. Specifically, when traditional clustering algorithms process high - dimensional data, the distances between samples become sparse, making it difficult to identify similarities. In addition, noise and redundant information in high - dimensional data further complicate the clustering process and affect the accuracy of clustering results. To address these problems, the paper proposes a unified framework that combines manifold learning and K - means clustering to form a self - supervised graph embedding framework. The main features of this framework include: 1. **K - means without centroids**: By establishing a connection between K - means and the manifold structure, K - means clustering can be performed without explicitly defining the clustering centers. 2. **Label generation in low - dimensional space**: Use K - means without centroids to generate labels in low - dimensional space, and use this label information to determine the similarity between samples, thereby ensuring the consistency between the manifold structure and the labels. 3. **Natural class balance**: By maximizing the ℓ2,1 norm, the class balance in the clustering process is naturally maintained. This has been verified through theoretical proof. Through experiments on multiple datasets, the paper demonstrates the excellence and reliability of the proposed Our - LPP and Our - MFA methods in clustering effects.