Graph Contrastive Learning with Node-Level Accurate Difference

Pengfei Jiao,Kaiyan Yu,Qing Bao,Ying Jiang,Xuan Guo,Zhidong Zhao
DOI: https://doi.org/10.1016/j.fmre.2024.06.013
2024-01-01
Fundamental Research
Abstract:Graph contrastive learning (GCL) has attracted extensive research interest due to its powerful ability to capture latent structural and semantic information of graphs in a self-supervised manner. Existing GCL methods commonly adopt predefined graph augmentations to generate two contrastive views. Subsequently, they design a contrastive pretext task between these views with the goal of maximizing their agreement. These methods assume the augmented graph can fully preserve the semantics of the original. However, typical data augmentation strategies in GCL, such as random edge dropping, may alter the properties of the original graph. As a result, previous GCL methods overlooked graph differences, potentially leading to difficulty distinguishing between graphs that are structurally similar but semantically different. Therefore, we argue that it is necessary to design a method that can quantify the dissimilarity between the original and augmented graphs to more accurately capture the relationships between samples. In this work, we propose a novel graph contrastive learning framework, named Accurate Difference-based Node-Level Graph Contrastive Learning (DNGCL), which helps the model distinguish similar graphs with slight differences by learning node-level differences between graphs. Specifically, we train the model to distinguish between original and augmented nodes via a node discriminator and employ cosine dissimilarity to accurately measure the difference between each node. Furthermore, we employ multiple types of data augmentation commonly used in current GCL methods on the original graph, aiming to learn the differences between nodes under different augmentation strategies and help the model learn richer local information. We conduct extensive experiments on six benchmark datasets and the results show that our DNGCL outperforms most state-of-the-art baselines, which strongly validates the effectiveness of our model.
What problem does this paper attempt to address?