Contrastive Graph Representation Learning with Adversarial Cross-view Reconstruction and Information Bottleneck

Yuntao Shou,Haozhi Lan,Xiangyong Cao
2024-08-01
Abstract:Graph Neural Networks (GNNs) have received extensive research attention due to their powerful information aggregation capabilities. Despite the success of GNNs, most of them suffer from the popularity bias issue in a graph caused by a small number of popular categories. Additionally, real graph datasets always contain incorrect node labels, which hinders GNNs from learning effective node representations. Graph contrastive learning (GCL) has been shown to be effective in solving the above problems for node classification tasks. Most existing GCL methods are implemented by randomly removing edges and nodes to create multiple contrasting views, and then maximizing the mutual information (MI) between these contrasting views to improve the node feature representation. However, maximizing the mutual information between multiple contrasting views may lead the model to learn some redundant information irrelevant to the node classification task. To tackle this issue, we propose an effective Contrastive Graph Representation Learning with Adversarial Cross-view Reconstruction and Information Bottleneck (CGRL) for node classification, which can adaptively learn to mask the nodes and edges in the graph to obtain the optimal graph structure representation. Furthermore, we innovatively introduce the information bottleneck theory into GCLs to remove redundant information in multiple contrasting views while retaining as much information as possible about node classification. Moreover, we add noise perturbations to the original views and reconstruct the augmented views by constructing adversarial views to improve the robustness of node feature representation. Extensive experiments on real-world public datasets demonstrate that our method significantly outperforms existing state-of-the-art algorithms.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two core issues faced by Graph Neural Networks (GNNs) in node classification tasks: 1. **Popularity Bias**: In most node classification tasks, the classes and degrees of nodes follow a long - tail distribution, that is, a small number of classes have a large number of nodes and interact frequently, while most nodes interact less. This unbalanced learning environment will cause GNNs to tend to learn the representations of those popular and frequently - interacting nodes, thus affecting the model's ability to learn the representations of nodes in the minority classes. 2. **Noise Interference**: There may be incorrect node labels in actual graph datasets, which will hinder GNNs from learning effective node representations. For example, mis - citations may exist in the paper citation process, resulting in noisy information in the data and thus affecting the performance of the model. To address the above problems, the paper proposes a new method - Contrastive Graph Representation Learning with Adversarial Cross - view Reconstruction and Information Bottleneck (CGRL). This method adaptively generates multiple graph - augmented views and combines the information bottleneck theory to remove redundant information while retaining as much information as possible related to the node classification task, thereby improving the robustness and effectiveness of node feature representations. Specifically, CGRL achieves these goals through the following three key techniques: - **Adaptive Automatic Generation of Graph - augmented Views**: By adaptively learning node masks and edge perturbations, optimize the original graph structure and generate augmented views with similar semantics but heterogeneous structures to alleviate the popularity bias problem of the model. - **Graph Contrastive Learning Based on Information Bottleneck**: By introducing the information bottleneck theory, reduce the redundant information between different views while retaining the task - related information for downstream tasks and improve the generalization ability of the model. - **Adversary Cross - view Reconstruction**: By introducing adversary views, further improve the robustness of node feature representations and ensure that the model can still maintain an understanding of the semantic integrity of the original graph when facing perturbations. Through these techniques, CGRL can significantly outperform the existing state - of - the - art algorithms in node classification tasks, especially in dealing with unbalanced datasets and noisy data.