ProtoMGAE: Prototype-aware Masked Graph Auto-Encoder for Graph Representation Learning

Yimei Zheng,Caiyan Jia
DOI: https://doi.org/10.1145/3649143
IF: 4.157
2024-02-20
ACM Transactions on Knowledge Discovery from Data
Abstract:Graph self-supervised representation learning has gained considerable attention and demonstrated remarkable efficacy in extracting meaningful representations from graphs, particularly in the absence of labeled data. Two representative methods in this domain are graph auto-encoding and graph contrastive learning. However, the former methods primarily focus on global structures, potentially overlooking some fine-grained information during reconstruction. The latter methods emphasize node similarity across correlated views in the embedding space, potentially neglecting the inherent global graph information in the original input space. Moreover, handling incomplete graphs in real-world scenarios, where original features are unavailable for certain nodes, poses challenges for both types of methods. To alleviate these limitations, we integrate masked graph auto-encoding and prototype-aware graph contrastive learning into a unified model to learn node representations in graphs. In our method, we begin by masking a portion of node features and utilize a specific decoding strategy to reconstruct the masked information. This process facilitates the recovery of graphs from a global or macro level and enables handling incomplete graphs easily. Moreover, we treat the masked graph and the original one as a pair of contrasting views, enforcing the alignment and uniformity between their corresponding node representations at a local or micro level. Lastly, to capture cluster structures from a meso level and learn more discriminative representations, we introduce a prototype-aware clustering consistency loss that is jointly optimized with the above two complementary objectives. Extensive experiments conducted on several datasets demonstrate that the proposed method achieves significantly better or competitive performance on downstream tasks, especially for graph clustering, compared with the state-of-the-art methods, showcasing its superiority in enhancing graph representation learning.
computer science, information systems, software engineering
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve several key problems in graph self - supervised representation learning: 1. **Balance between global structure and local information**: - Graph auto - encoder methods mainly focus on the global structure and may overlook some fine - grained information. - Graph contrastive learning methods emphasize the similarity of nodes in the embedding space but may neglect the global graph information in the original input space. 2. **Handling incomplete graphs**: - Graph data in the real world often contains noise, resulting in some nodes lacking features. This poses challenges to graph auto - encoders and graph contrastive learning methods and may affect the quality of the learned representations. 3. **Multi - scale learning**: - Existing methods usually learn graph representations from a macroscopic or microscopic perspective and lack the ability to capture cluster structures at the mesoscopic level. To solve these problems, the authors propose a new graph representation learning method - ProtoMGAE (Prototype - aware Masked Graph Auto - Encoder). This method combines masked graph auto - encoding and prototype - aware graph contrastive learning and learns node representations by jointly optimizing multiple complementary objectives, thereby comprehensively enhancing the effect of graph representation learning at the macroscopic, microscopic, and mesoscopic levels. ### Main contributions 1. **Complementary objectives**: - Jointly optimize multiple complementary objectives, including node representation contrast at the microscopic level, cluster distribution consistency at the mesoscopic level, and masked feature reconstruction at the macroscopic level, to learn node representations. 2. **Enhanced representations**: - Use the masked graph modeling strategy to handle missing node features in incomplete graphs. In addition, through the contrastive objective of the online - target network, the representations of positive sample pairs are brought closer, while ensuring that the representations are evenly distributed on the unit hypersphere, thereby learning more robust and discriminative node representations. 3. **Performance improvement**: - Extensive experimental results show that the proposed model performs well on various real - world graph data sets, especially in graph clustering tasks, outperforming the existing state - of - the - art methods. ### Method overview The main components of the ProtoMGAE model include: 1. **Masked graph reconstruction**: - Perturb the node features of the original graph through a random masking strategy to generate a masked graph. Use a two - layer GAT as an encoder to learn robust node representations and reconstruct the masked features through a decoder. 2. **Online - target contrast**: - The target network directly takes the original graph as input to construct a contrastive view. Stabilize the parameters of the target network through a momentum update mechanism to ensure that the learning process of the online network is effectively guided. 3. **Prototype - aware clustering**: - Introduce trainable prototype vectors to calculate the soft distribution matrix of the predicted representation and the target representation. By selecting a reliable set of nodes, ensure that the cluster distribution of the masked graph is consistent with its assignment in the original graph. Through these methods, ProtoMGAE can comprehensively enhance the effect of graph representation learning at different levels and improve the performance of downstream tasks.