Abstract:Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph. A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task. Furthermore, new and diverse datasets have also been created to better evaluate the effectiveness of these new models. However, multiple pitfalls currently exist that hinder our ability to properly evaluate these new methods. These pitfalls mainly include: (1) Lower than actual performance on multiple baselines, (2) A lack of a unified data split and evaluation metric on some datasets, and (3) An unrealistic evaluation setting that uses easy negative samples. To overcome these challenges, we first conduct a fair comparison across prominent methods and datasets, utilizing the same dataset and hyperparameter search settings. We then create a more practical evaluation setting based on a Heuristic Related Sampling Technique (HeaRT), which samples hard negative samples via multiple heuristics. The new evaluation setting helps promote new challenges and opportunities in link prediction by aligning the evaluation with real-world situations. Our implementation and data are available at <a class="link-external link-https" href="https://github.com/Juanhui28/HeaRT" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

This paper attempts to solve the evaluation problem of graph neural networks (GNNs) in link prediction tasks. Specifically, the paper points out several major problems in current evaluation methods and proposes a new evaluation setting to improve these problems. The following are the main problems that the paper attempts to solve: 1. **Underestimation of performance**: - The paper points out that the actual performance of some models is underestimated. For example, the standard GNN has poor performance due to improper hyper - parameter tuning. Through appropriate tuning, these models can significantly improve their performance. For some methods (such as Neo - GNN), the performance improvement can even reach 8.5 percentage points. 2. **Lack of a unified evaluation setting**: - Different studies use different data set splits and evaluation metrics, making it difficult to make a fair comparison. For example, data sets such as Cora, Citeseer, and Pubmed use different training/validation/test split ratios and evaluation metrics (such as AUC and MRR) in different studies. In addition, some methods will include validation edges during testing, while others will not, which further increases the complexity of comparison. 3. **Unrealistic evaluation setting**: - The current evaluation setting uses randomly selected negative samples for evaluation, which makes the task too simple and not in line with the actual situation. For example, when recommending friends in a social network, we are more concerned about recommending friends for a specific user u, rather than pairing u with other unrelated nodes. In addition, randomly selected negative samples usually have no common neighbors, so they are easy to classify and cannot reflect the performance of the model in practical applications. To overcome these problems, the paper proposes the following solutions: - **Reproducible and fair comparison**: - Under the existing evaluation setting, a fair comparison of different models on multiple common data sets is made. All models are tuned within the same hyper - parameter range and evaluated using multiple evaluation metrics. - **New evaluation setting (HeaRT)**: - A new evaluation setting based on the Heuristic Related Sampling Technique (HeaRT) is proposed. HeaRT creates a more challenging evaluation task by personalizing negative samples and selecting more difficult negative samples, thereby better simulating the real - world situation. Through these improvements, the paper aims to provide a more accurate and reliable link prediction evaluation method to promote the further development of this field.

Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking

Revisiting Link Prediction: A Data Perspective

New Perspectives on the Evaluation of Link Prediction Algorithms for Dynamic Graphs

Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices

Building a Benchmark for Evaluating Link Prediction Methods

Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods

A Graph Attention Network-Based Link Prediction Method Using Link Value Estimation

Link Prediction with Non-Contrastive Learning

Hashing-Accelerated Graph Neural Networks for Link Prediction

Pitfalls of Graph Neural Network Evaluation

Heuristic Learning with Graph Neural Networks: A Unified Framework for Link Prediction

Improving Graph Neural Network Models in Link Prediction Task Via A Policy-Based Training Method

Mixture of Link Predictors on Graphs

Generative Graph Neural Networks for Link Prediction

Promoting Fairness in Link Prediction with Graph Enhancement

Evaluating Link Prediction Methods

Link Prediction via Graph Attention Network

Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches

Synthetic graphs for link prediction benchmarking

Fully-inductive Link Prediction with Path-Based Graph Neural Network: A Comparative Analysis

Can GNNs Learn Heuristic Information for Link Prediction?