Masked AutoEncoder for Graph Clustering without Pre-defined Cluster Number k

Yuanchi Ma,Hui He,Zhongxiang Lei,Zhendong Niu
2024-01-09
Abstract:Graph clustering algorithms with autoencoder structures have recently gained popularity due to their efficient performance and low training cost. However, for existing graph autoencoder clustering algorithms based on GCN or GAT, not only do they lack good generalization ability, but also the number of clusters clustered by such autoencoder models is difficult to determine automatically. To solve this problem, we propose a new framework called Graph Clustering with Masked Autoencoders (GCMA). It employs our designed fusion autoencoder based on the graph masking method for the fusion coding of graph. It introduces our improved density-based clustering algorithm as a second decoder while decoding with multi-target reconstruction. By decoding the mask embedding, our model can capture more generalized and comprehensive knowledge. The number of clusters and clustering results can be output end-to-end while improving the generalization ability. As a nonparametric class method, extensive experiments demonstrate the superiority of \textit{GCMA} over state-of-the-art baselines.
Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Automatically determining the number of clusters**: Existing clustering algorithms based on Graph Autoencoder (GAE) usually require the number of clusters \(k\) to be predefined, which is often unknown in practical applications. Therefore, the researchers propose a method to automatically determine the optimal number of clusters. 2. **Improving model generalization ability**: Current graph autoencoder clustering algorithms based on Graph Convolutional Network (GCN) or Graph Attention Network (GAT) lack good generalization ability. This means they may not handle unseen data well. 3. **Enhancing the quality of graph embeddings**: Graph autoencoders based on simple graph reconstruction principles may overly emphasize neighboring information, which is not always beneficial for self-supervised learning. Therefore, the researchers designed better pre-training tasks to improve the quality of the learned graph embeddings. To address the above issues, the researchers proposed a new framework named GraphClustering with Masked Autoencoders (GCMA). This framework combines graph masked autoencoders with an improved density-based clustering algorithm, enabling graph data clustering without the need to predefine the number of clusters, and improving the model's generalization ability and interpretability. Experimental results show that GCMA outperforms existing baseline methods on multiple datasets.