Abstract:Hypergraphs (i.e., sets of hyperedges) naturally represent group relations (e.g., researchers co-authoring a paper and ingredients used together in a recipe), each of which corresponds to a hyperedge (i.e., a subset of nodes). Predicting future or missing hyperedges bears significant implications for many applications (e.g., collaboration and recipe recommendation). What makes hyperedge prediction particularly challenging is the vast number of non-hyperedge subsets, which grows exponentially with the number of nodes. Since it is prohibitive to use all of them as negative examples for model training, it is inevitable to sample a very small portion of them, and to this end, heuristic sampling schemes have been employed. However, trained models suffer from poor generalization capability for examples of different natures. In this paper, we propose AHP, an adversarial training-based hyperedge-prediction method. It learns to sample negative examples without relying on any heuristic schemes. Using six real hypergraphs, we show that AHP generalizes better to negative examples of various natures. It yields up to 28.2% higher AUROC than the best existing methods and often even outperforms its variants with sampling schemes tailored to test sets.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to predict future or unobserved hyperedges in a hypergraph. Specifically, due to the extremely large number of potential negative samples (which grows exponentially with the number of nodes), it is not possible to use all non - hyperedge subsets as negative samples for training the model. Therefore, these potential negative samples must be sampled. However, the existing sampling methods based on heuristic schemes lead to poor generalization ability of the model on samples with different properties.
### Main Challenges
1. **Large Negative Sample Space**: The number of non - hyperedge subsets in the hypergraph is very large. For example, in the DBLP dataset, the number of potential negative samples is approximately \(10^{4707}\), while the actual number of hyperedges is only about 23,000.
2. **Limitations of Negative Sample Sampling**: Existing machine - learning methods rely on heuristic sampling schemes (such as SNS, MNS, CNS, etc.), which limit the generalization ability of the model, especially when the sampling scheme of the test set is inconsistent with that of the training set.
### Solutions
To solve the above problems, the paper proposes AHP (Adversarial training - based Hyperedge Prediction), a hyperedge prediction method based on adversarial training. AHP improves negative sample sampling in the following ways:
- **Generator**: Learns to generate hard negative examples to improve the training effect of the discriminator.
- **Discriminator**: Used to evaluate whether a given node subset can form a hyperedge, and updates the generator and its own parameters according to the evaluation results.
- **Adversarial Training**: The generator and the discriminator improve each other through adversarial training. The goal of the generator is to generate negative samples that can "deceive" the discriminator, while the goal of the discriminator is to correctly distinguish positive and negative samples.
### Experimental Results
Through experiments on six real - world hypergraph datasets, AHP outperforms the best existing method by up to 28.2% in the AUROC metric, and performs better than its variants (AHP versions using specific sampling schemes) on multiple datasets. In addition, AHP also shows better generalization ability and is not affected by specific sampling schemes.
### Markdown Representation of Formulas
The formulas involved in the paper are as follows:
1. Discriminator loss function:
\[
L_D=-\frac{1}{|S|} \sum_{s \in S}[D(s | H, X)]+\frac{1}{|S|} \sum_{j = 1}^{|S|}[D(G(z_j)|H, X))]
\]
2. Generator loss function:
\[
L_G =-\frac{1}{|S|} \sum_{j = 1}^{|S|}[D(G(z_j)|H, X))]
\]
where \(S\) is the set of positive samples, \(G(z_j)\) is the negative sample generated from noise \(z_j\), \(D\) is the discriminator, \(H\) is the hypergraph, and \(X\) is the node feature.
### Summary
AHP effectively solves the problem of negative sample sampling in hyperedge prediction by introducing an adversarial training mechanism, significantly improving the generalization ability and prediction performance of the model.