Bayesian Robust Graph Contrastive Learning

Yancheng Wang,Yingzhen Yang
DOI: https://doi.org/10.48550/arXiv.2205.14109
2022-06-03
Abstract:Graph Neural Networks (GNNs) have been widely used to learn node representations and with outstanding performance on various tasks such as node classification. However, noise, which inevitably exists in real-world graph data, would considerably degrade the performance of GNNs as the noise is easily propagated via the graph structure. In this work, we propose a novel and robust method, Bayesian Robust Graph Contrastive Learning (BRGCL), which trains a GNN encoder to learn robust node representations. The BRGCL encoder is a completely unsupervised encoder. Two steps are iteratively executed at each epoch of training the BRGCL encoder: (1) estimating confident nodes and computing robust cluster prototypes of node representations through a novel Bayesian nonparametric method; (2) prototypical contrastive learning between the node representations and the robust cluster prototypes. Experiments on public and large-scale benchmarks demonstrate the superior performance of BRGCL and the robustness of the learned node representations. The code of BRGCL is available at \url{<a class="link-external link-https" href="https://github.com/BRGCL-code/BRGCL-code" rel="external noopener nofollow">this https URL</a>}.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to train graph neural networks (GNNs) in graph data with noise to obtain node representations that are robust to noise. Specifically, the noise in graph data may exist in node attributes or node labels, which will significantly degrade the performance of GNNs. Existing GNN methods perform poorly when dealing with noisy data because noise can propagate through the graph structure and affect the learning effect of other nodes. Therefore, the authors propose a new method - Bayesian Robust Graph Contrastive Learning (BRGCL), aiming to improve the robustness of GNNs on noisy data. ### Problem Background 1. **The Influence of Noise**: - The noise in graph data is mainly divided into two categories: attribute noise and label noise. These noises will lead to the decline of GNN performance. - Noise can propagate through the topological structure of the graph, further affecting the representation learning of other nodes. 2. **Limitations of Existing Methods**: - Although manual cleaning and labeling of data can reduce the impact of noise, it is costly and difficult to scale, and cannot handle large - scale online noisy data. - Most of the existing GNN methods do not consider the noise problem in the input graph, resulting in poor performance in practical applications. ### The Goals of BRGCL The main goals of BRGCL to improve the robustness of GNNs on noisy data are as follows: - **Completely Unsupervised**: BRGCL does not require any prior knowledge of true labels or categories, and only depends on the input node attributes for training. - **Utilizing Confident Nodes**: BRGCL identifies those nodes that are more confident about their category labels through a new algorithm called Bayesian nonparametric Estimation of Confidence (BEC), and uses these nodes to guide model training. - **Contrastive Learning**: BRGCL adopts a contrastive learning framework and learns robust node representations by maximizing the mutual information between different views. ### Key Points of the Solution 1. **BEC Algorithm**: - The BEC algorithm is used to estimate confident nodes and their prototype representations. Confident nodes refer to those nodes that are far from the category boundaries and are not easily affected by noise. - Through the Bayesian nonparametric method, BEC can infer pseudo - labels without true labels and estimate confident nodes based on these pseudo - labels. 2. **Contrastive Learning Framework**: - BRGCL uses a contrastive learning framework. By generating two different graph views and maximizing the consistency between these two views, it learns robust node representations. - At the same time, BRGCL also adopts prototype - based contrastive learning and further improves robustness by maximizing the mutual information between node representations and robust prototypes. 3. **Decoupled Training**: - In order to reduce the impact of noise on the classifier, BRGCL decouples node representation learning from the classification task. First, train the BRGCL encoder to obtain robust node representations, and then train the classifier on these representations. Through the above methods, BRGCL can show better performance than existing methods on noisy data, and its robustness to noise has been verified in experiments.