LAC: Graph Contrastive Learning with Learnable Augmentation in Continuous Space

Zhenyu Lin,Hongzheng Li,Yingxia Shao,Guanhua Ye,Yawen Li,Quanqing Xu
2024-10-20
Abstract:Graph Contrastive Learning frameworks have demonstrated success in generating high-quality node representations. The existing research on efficient data augmentation methods and ideal pretext tasks for graph contrastive learning remains limited, resulting in suboptimal node representation in the unsupervised setting. In this paper, we introduce LAC, a graph contrastive learning framework with learnable data augmentation in an orthogonal continuous space. To capture the representative information in the graph data during augmentation, we introduce a continuous view augmenter, that applies both a masked topology augmentation module and a cross-channel feature augmentation module to adaptively augment the topological information and the feature information within an orthogonal continuous space, respectively. The orthogonal nature of continuous space ensures that the augmentation process avoids dimension collapse. To enhance the effectiveness of pretext tasks, we propose an information-theoretic principle named InfoBal and introduce corresponding pretext tasks. These tasks enable the continuous view augmenter to maintain consistency in the representative information across views while maximizing diversity between views, and allow the encoder to fully utilize the representative information in the unsupervised setting. Our experimental results show that LAC significantly outperforms the state-of-the-art frameworks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main challenges encountered by the Graph Contrastive Learning (GCL) framework in an unsupervised setting: 1. **Insufficient existing data augmentation methods**: - **Topological augmentation**: Existing discretization methods make minor perturbations to the topological information of graphs (such as deleting edges or nodes), which may destroy the representative information in the graph. For example, in the chemical molecule classification task, randomly removing key chemical bonds will lead to low - quality augmented views. - **Feature augmentation**: Existing methods discretize continuous feature information, which may lead to dimension collapse and destroy the representative information in graph data. 2. **Poor performance of pre - training tasks**: - Existing pre - training tasks fail to effectively maintain the consistency of representative information among multiple augmented views while ensuring the diversity of views. This may cause the model to generate overly similar views and embeddings, thus affecting the overall performance of the GCL framework. To solve these problems, the authors propose a new graph contrastive learning framework LAC (Learnable Augmentation in Continuous space), which has the following features: - **Learnable continuous - space augmentation module (Continuous View Augmenter, CVA)**: CVA adaptively augments topological and feature information in the orthogonal continuous space by introducing the Masked Topology Augmentation (MTA) module and the Cross - channel Feature Augmentation (CFA) module, avoiding information loss and dimension collapse caused by discrete augmentation methods. - **Information balance principle (InfoBal)**: To improve the effectiveness of pre - training tasks, the authors introduce the information balance principle (InfoBal), which includes two sub - principles for enhancing the consistency and diversity of views respectively and helps the encoder make full use of representative information. Through these improvements, the LAC framework significantly outperforms the existing state - of - the - art GCL frameworks in an unsupervised setting. ### Formula summary 1. **Topological representation**: \[ A = U \Lambda U^T, \quad U^T U = I_N \] where \(U\) is an orthogonal basis vector and \(\Lambda\) is a diagonal matrix. 2. **Feature representation**: \[ C = U^T X \] 3. **Augmented diagonal matrix**: \[ \Lambda' = (I - M) \odot \Lambda + M \odot \Lambda' \] 4. **Final augmented view**: \[ V' = (A', X') = (U \Lambda' U^T, U C') \] 5. **Mutual information calculation**: \[ I(f(V); f(V')) = \frac{1}{2N} \sum_{i = 1}^N \log \frac{\exp(s(z_i, z'_i)/\tau)}{\sum_{k \in V} \exp(s(z_i, z'_k)/\tau)} \] These formulas show how the LAC framework performs data augmentation in the orthogonal continuous space and guides the learning process of the augmentation module and the encoder through the information balance principle.