Hierarchical Graph Contrastive Learning Via Debiasing Noise Samples with Adaptive Repelling Ratio

Peishuo Liu,Cangqi Zhou,Jing Zhang,Qianmu Li,Dianming Hu
DOI: https://doi.org/10.1109/icdm58522.2023.00051
2023-01-01
Abstract:In recent years, the field of unsupervised graph representation learning has witnessed the emergence of graph contrastive learning (GCL) as a highly successful approach. GCL excels in learning graph representations by effectively bringing positive sample pairs into proximity while simultaneously pushing negative sample pairs apart in the representation space, without any manual labels. Graph-structured data to be learned inherently exhibits a critical hierarchical structure, which is crucial for organizing and managing graphs. Leveraging this attribute enhances the accuracy of graph representation outcomes. However, current GCL methods tend to overlook the hierarchical structure, which can result in sampling bias during node selection. Nodes of the same semantics can potentially be sampled as negative pairs. To overcome these limitations, we present a novel framework, Hierarchical Graph Contrastive Learning via Debiasing Noise Samples with Adaptive Repelling Ratio (HGClear). Our framework enables the simultaneous learning of node representations and the graph hierarchy in an end-to-end manner. During the process of method design, we discovered that the accuracy of node category prediction significantly affects representation results. To remove the bias caused by noise samples, we introduced a module to handle boundary nodes (i.e., noise samples) that are vulnerable to mislabeling. Specifically, we introduce a hierarchy detection module that captures both coarse-grained views and category attributes of nodes. Leveraging these results, we can identify boundary nodes and establish varying repelling ratios based on category labels, replacing the conventional temperature coefficient in the contrastive loss. Simultaneously incorporating an intra-view node contrast module not only eliminates the bias resulting from noise samples but also enhances the uniqueness of node representations. Numerous experiments on node classification datasets show that HGClear produces encouraging results and outperforms some state-of-the-art methods.
What problem does this paper attempt to address?