Temporal Graph Benchmark for Machine Learning on Temporal Graphs

Shenyang Huang,Farimah Poursafaei,Jacob Danovitch,Matthias Fey,Weihua Hu,Emanuele Rossi,Jure Leskovec,Michael Bronstein,Guillaume Rabusseau,Reihaneh Rabbany
2023-09-28
Abstract:We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at <a class="link-external link-https" href="https://tgb.complexdatalab.com/" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the following problems: 1. **Limitations of existing Temporal Graphs datasets**: - The existing temporal graph benchmark datasets are small in scale and cannot reflect the situation of large - scale networks in the real world. - The datasets are single in domain, mainly concentrated in social and interaction networks, lacking diversity. - There is a lack of large - scale datasets for dynamic node attribute prediction tasks. 2. **Problems with existing evaluation protocols**: - The existing evaluation protocols are too optimistic, resulting in a mismatch between the model's performance in practical applications and the evaluation results. - In the dynamic link prediction task, the commonly used negative sample generation method is too simple, easily generating negative samples that are easy to predict, leading to inaccurate performance evaluation. 3. **Lack of a standardized evaluation platform**: - Currently, there is a lack of an open, standardized benchmark platform that can evaluate machine learning models on temporal graphs truly and reproducibly. To solve these problems, the paper proposes **Temporal Graph Benchmark (TGB)**, which is a platform containing multiple challenging and diverse benchmark datasets, aiming to provide more realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. Specific contributions include: - **Large - scale and diverse datasets**: TGB contains datasets from different domains, covering node - and edge - level tasks, and the scale of these datasets far exceeds that of existing datasets. - **Improved evaluation protocols**: For the dynamic link prediction task, a more complex negative sample generation method is introduced, and Mean Reciprocal Rank (MRR) is used as an evaluation metric; for the dynamic node attribute prediction task, the Normalized Discounted Cumulative Gain (NDCG) metric is introduced. - **Public leaderboards and reproducible results**: TGB provides an automated pipeline that supports data loading, experimental setup, and performance evaluation, and tracks the latest progress through public leaderboards. ### Formula summary - **Evaluation metric for the dynamic link prediction task**: \[ \text{MRR}=\frac{1}{\vert E_{\text{test}}\vert}\sum_{(u, v, t)\in E_{\text{test}}}\frac{1}{\text{rank}(v)} \] where \(\text{rank}(v)\) represents the rank of the positive sample \(v\) among all negative samples. - **Objective function for the dynamic node attribute prediction task**: \[ y_t[u, v]=\frac{\sum_{t < t_i\leq t + k}w(u, v, t_i)}{\sum_{z\in N}\sum_{t < t_i\leq t + k}w(u, z, t_i)} \] where \(w(u, v, t_i)\) is the weight of the edge \((u, v, t_i)\), \(N\) is the set of candidate nodes, and \(k\) is the prediction window size. Through these improvements, the TGB platform provides a more comprehensive and reliable evaluation tool for machine learning research on temporal graphs.