Hypergraph Self-supervised Learning with Sampling-efficient Signals

Fan Li,Xiaoyang Wang,Dawei Cheng,Wenjie Zhang,Ying Zhang,Xuemin Lin
2024-04-18
Abstract:Self-supervised learning (SSL) provides a promising alternative for representation learning on hypergraphs without costly labels. However, existing hypergraph SSL models are mostly based on contrastive methods with the instance-level discrimination strategy, suffering from two significant limitations: (1) They select negative samples arbitrarily, which is unreliable in deciding similar and dissimilar pairs, causing training bias. (2) They often require a large number of negative samples, resulting in expensive computational costs. To address the above issues, we propose SE-HSSL, a hypergraph SSL framework with three sampling-efficient self-supervised signals. Specifically, we introduce two sampling-free objectives leveraging the canonical correlation analysis as the node-level and group-level self-supervised signals. Additionally, we develop a novel hierarchical membership-level contrast objective motivated by the cascading overlap relationship in hypergraphs, which can further reduce membership sampling bias and improve the efficiency of sample utilization. Through comprehensive experiments on 7 real-world hypergraphs, we demonstrate the superiority of our approach over the state-of-the-art method in terms of both effectiveness and efficiency.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two key problems in Hypergraph Self - Supervised Learning (HSSL): 1. **Training Bias**: - Most existing hypergraph self - supervised learning models are based on contrastive learning methods and use instance - level discrimination strategies. This method is arbitrary when selecting negative samples, making it difficult to accurately judge similar and dissimilar node pairs or hyper - edge pairs, thus introducing training bias. For example, in a co - author hypergraph, if authors in the same research field but with different cooperation frequencies are regarded as negative samples, the model may learn incorrect representations. 2. **Sampling Inefficiency**: - These models usually require a large number of negative samples to achieve optimal performance, which brings high computational costs, especially when dealing with large - scale hypergraphs. For example, the TriCL method has a time complexity of \(O(|V|\times|E|)\) when calculating the scoring function, where \(|V|\) and \(|E|\) represent the number of nodes and hyper - edges respectively. This complexity greatly limits the training speed. To address these problems, the paper proposes SE - HSSL (Sampling - Efficient Hypergraph Self - Supervised Learning), an efficient hypergraph self - supervised learning framework. SE - HSSL solves the above problems in the following ways: - **Introducing Sampling - Free Self - Supervised Signals**: SE - HSSL introduces node - level and group - level self - supervised signals based on Canonical Correlation Analysis (CCA). These signals do not need to rely on negative samples, thus reducing training bias and improving the discriminability of representations. - **Designing Hierarchical Membership - Level Contrastive Objectives**: SE - HSSL proposes a new hierarchical membership - level contrastive objective, which uses the cascading overlap relationships in hypergraphs to reduce sampling bias in membership - level learning and significantly reduces the number of required negative samples. This not only improves the effectiveness of the model but also its efficiency. Through experiments on 7 real - world hypergraph datasets, the paper proves that SE - HSSL is superior to existing methods in both effectiveness and efficiency.