SpeGCL: Self-supervised Graph Spectrum Contrastive Learning without Positive Samples

Yuntao Shou,Xiangyong Cao,Deyu Meng
2024-10-14
Abstract:Graph Contrastive Learning (GCL) excels at managing noise and fluctuations in input data, making it popular in various fields (e.g., social networks, and knowledge graphs). Our study finds that the difference in high-frequency information between augmented graphs is greater than that in low-frequency information. However, most existing GCL methods focus mainly on the time domain (low-frequency information) for node feature representations and cannot make good use of high-frequency information to speed up model convergence. Furthermore, existing GCL paradigms optimize graph embedding representations by pulling the distance between positive sample pairs closer and pushing the distance between positive and negative sample pairs farther away, but our theoretical analysis shows that graph contrastive learning benefits from pushing negative pairs farther away rather than pulling positive pairs closer. To solve the above-mentioned problems, we propose a novel spectral GCL framework without positive samples, named SpeGCL. Specifically, to solve the problem that existing GCL methods cannot utilize high-frequency information, SpeGCL uses a Fourier transform to extract high-frequency and low-frequency information of node features, and constructs a contrastive learning mechanism in a Fourier space to obtain better node feature representation. Furthermore, SpeGCL relies entirely on negative samples to refine the graph embedding. We also provide a theoretical justification for the efficacy of using only negative samples in SpeGCL. Extensive experiments on un-supervised learning, transfer learning, and semi-supervised learning have validated the superiority of our SpeGCL framework over the state-of-the-art GCL methods.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two major limitations in the existing graph contrastive learning (GCL) methods when using node feature representations: 1. **Under - utilization of high - frequency information**: Most existing GCL methods mainly focus on node feature representations in the time domain (low - frequency information) while ignoring high - frequency information, which limits the convergence speed and performance of the model. 2. **Dependence of contrastive learning mechanism on positive sample pairs**: The existing GCL paradigm optimizes graph embedding representations by shortening the distance between positive sample pairs and lengthening the distance between positive and negative sample pairs. However, theoretical analysis shows that graph contrastive learning actually benefits more from lengthening the distance between negative sample pairs rather than shortening the distance between positive sample pairs. To overcome these problems, the authors propose a new spectral graph contrastive learning framework - **SpeGCL**. Specifically: - **Using Fourier transform to extract high - frequency and low - frequency information**: SpeGCL uses the Fourier transform to extract high - frequency and low - frequency information from node features and constructs a contrastive learning mechanism in the Fourier space to obtain better node feature representations. - **Relying solely on negative sample pairs for contrastive learning**: Unlike traditional GCL methods, SpeGCL relies entirely on negative sample pairs to optimize graph embedding representations and provides a theoretical proof of the effectiveness of using only negative sample pairs. ### Main contributions of the paper: 1. **Proposing a new spectral graph contrastive learning model (SpeGCL)**: This model uses the Fourier transform to capture both low - frequency and high - frequency information of nodes simultaneously and promotes the aggregation of node features through the convolution theorem, thereby enhancing the representation ability of nodes. 2. **Proposing a new contrastive learning strategy**: This strategy uses only negative sample pairs to accelerate model training and parameter optimization and proves that the model can achieve convergence using only negative sample pairs. 3. **Conducting extensive experimental evaluations on multiple graph classification tasks**: The experimental results show that SpeGCL outperforms other state - of - the - art GCL methods in unsupervised learning, transfer learning, and semi - supervised learning tasks. ### Related work: - **Graph contrastive learning (GCL)**: In recent years, many GCL methods have been proposed. These methods generate different graph views through data augmentation strategies and learn graph representations by maximizing the mutual information between different views. - **Frequency - domain deep learning**: Frequency - domain analysis methods have always been a classic tool in traditional signal processing and have also been recently applied in deep learning for analyzing optimization and generalization capabilities. ### Method overview: 1. **Multi - view augmentation**: Generate enhanced views through node masking and edge perturbation strategies. 2. **Fourier graph convolutional neural network**: Design an efficient method based on the convolution theorem to obtain the feature representations of nodes in the Fourier space. 3. **Graph contrastive learning**: Construct samples containing low - frequency and high - frequency information, improve the feature discrimination ability of the encoder through contrastive learning, and optimize using only negative sample pairs. 4. **Self - negative sample sampling**: Propose a self - supervised GCL framework that does not require positive sample pairs and promotes model training by optimizing the contrast loss of negative sample pairs. ### Experimental results: - **Datasets**: Use TUDataset and MoleculeNet datasets for experiments to verify the effectiveness of SpeGCL in unsupervised learning, transfer learning, and semi - supervised learning tasks. - **Baseline methods**: Compare with a variety of existing graph contrastive learning methods, and the results show that SpeGCL performs excellently in multiple tasks. In conclusion, this paper effectively solves the limitations of existing GCL methods and improves the quality and efficiency of graph representation learning by introducing the Fourier transform and a contrastive learning mechanism that relies solely on negative sample pairs.