Community-Invariant Graph Contrastive Learning

Shiyin Tan,Dongyuan Li,Renhe Jiang,Ying Zhang,Manabu Okumura
2024-05-02
Abstract:Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (
Machine Learning,Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of existing Graph Contrastive Learning (GCL) methods in graph augmentation. Specifically: 1. **Limitations of Random Graph Augmentation**: Mainstream GCL methods tend to randomly disrupt the graph structure for augmentation. Although this method is simple, its effect is limited and it will inevitably lead to the loss of high - level information of the graph (such as community structure), thus limiting the generalization ability of the model. 2. **Limitations of Knowledge - Guided Graph Augmentation**: Current knowledge - based graph augmentation methods can only focus on either the topological structure or the node features of the graph, resulting in insufficient robustness of the model to different types of noise. To overcome these limitations, the paper proposes a new framework - **Community - Invariant Graph Contrastive Learning (CI - GCL)**, which aims to maintain the invariance of the graph community structure by maximizing the spectral change, and unify the constraints of topological and feature augmentation, thereby improving the robustness and performance of the model. ### Main Contributions: 1. **Proposing the CI - GCL Framework**: By maximizing the spectral change loss, it automatically maintains community invariance during the graph augmentation process and improves the performance of the model in downstream tasks. 2. **Theoretical Proof**: It shows that the proposed CI constraint can be applied to topological and feature augmentation, enhancing the robustness of the model. 3. **Experimental Verification**: Experiments were carried out on 21 widely - used benchmark datasets to verify the effectiveness and robustness of CI - GCL. ### Method Overview: - **Topological Augmentation**: It is achieved through edge perturbation and node deletion operations, while maximizing the spectral change to keep the community structure unchanged. - **Feature Augmentation**: It is achieved through feature masking operations, also by maximizing the spectral change to keep the community structure unchanged. - **Optimization and Scalability**: The projection gradient descent method is used to jointly optimize the loss functions of topological and feature augmentation, and techniques such as selective eigenvalue decomposition and truncated SVD are used to reduce the computational complexity, making it suitable for large - scale graph data. ### Experimental Results: - **Unsupervised Learning**: In graph classification and regression tasks, CI - GCL achieved the best or near - best results on multiple datasets. In particular, the average accuracy rate reached 77.74% in graph classification tasks, and the average RMSE reached 1.606 in graph regression tasks. - **Semi - supervised Learning**: In semi - supervised graph classification tasks with 10% labeled data, CI - GCL also performed well, with an average accuracy rate of 74.0%. In general, by introducing the concept of community invariance, this paper proposes a new graph contrastive learning framework, which effectively solves the limitations of existing methods in graph augmentation and improves the generalization ability and robustness of the model.