Abstract:We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at <a class="link-external link-https" href="https://tgb.complexdatalab.com/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve the following problems: 1. **Limitations of existing Temporal Graphs datasets**: - The existing temporal graph benchmark datasets are small in scale and cannot reflect the situation of large - scale networks in the real world. - The datasets are single in domain, mainly concentrated in social and interaction networks, lacking diversity. - There is a lack of large - scale datasets for dynamic node attribute prediction tasks. 2. **Problems with existing evaluation protocols**: - The existing evaluation protocols are too optimistic, resulting in a mismatch between the model's performance in practical applications and the evaluation results. - In the dynamic link prediction task, the commonly used negative sample generation method is too simple, easily generating negative samples that are easy to predict, leading to inaccurate performance evaluation. 3. **Lack of a standardized evaluation platform**: - Currently, there is a lack of an open, standardized benchmark platform that can evaluate machine learning models on temporal graphs truly and reproducibly. To solve these problems, the paper proposes **Temporal Graph Benchmark (TGB)**, which is a platform containing multiple challenging and diverse benchmark datasets, aiming to provide more realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. Specific contributions include: - **Large - scale and diverse datasets**: TGB contains datasets from different domains, covering node - and edge - level tasks, and the scale of these datasets far exceeds that of existing datasets. - **Improved evaluation protocols**: For the dynamic link prediction task, a more complex negative sample generation method is introduced, and Mean Reciprocal Rank (MRR) is used as an evaluation metric; for the dynamic node attribute prediction task, the Normalized Discounted Cumulative Gain (NDCG) metric is introduced. - **Public leaderboards and reproducible results**: TGB provides an automated pipeline that supports data loading, experimental setup, and performance evaluation, and tracks the latest progress through public leaderboards. ### Formula summary - **Evaluation metric for the dynamic link prediction task**: \[ \text{MRR}=\frac{1}{\vert E_{\text{test}}\vert}\sum_{(u, v, t)\in E_{\text{test}}}\frac{1}{\text{rank}(v)} \] where \(\text{rank}(v)\) represents the rank of the positive sample \(v\) among all negative samples. - **Objective function for the dynamic node attribute prediction task**: \[ y_t[u, v]=\frac{\sum_{t < t_i\leq t + k}w(u, v, t_i)}{\sum_{z\in N}\sum_{t < t_i\leq t + k}w(u, z, t_i)} \] where \(w(u, v, t_i)\) is the weight of the edge \((u, v, t_i)\), \(N\) is the set of candidate nodes, and \(k\) is the prediction window size. Through these improvements, the TGB platform provides a more comprehensive and reliable evaluation tool for machine learning research on temporal graphs.

Temporal Graph Benchmark for Machine Learning on Temporal Graphs

TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

An Empirical Evaluation of Temporal Graph Benchmark

Datasets for Paper "benchtemp: A General Benchmark for Evaluating Temporal Graph Neural Networks"

Temporal Graph Analysis with TGX

Introduction to a Temporal Graph Benchmark

DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Open Graph Benchmark: Datasets for Machine Learning on Graphs

Analysis of different temporal graph neural network configurations on dynamic graphs

Temporal Graph Generation Featuring Time-Bound Communities

TGL: A General Framework for Temporal GNN Training on Billion-Scale Graphs

Temporal receptive field in dynamic graph learning: A comprehensive analysis

Scalable and Efficient Temporal Graph Representation Learning via Forward Recent Sampling

TimeGraphs: Graph-based Temporal Reasoning

ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic Graphs

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Deep Temporal Graph Clustering

An Efficient Vertex-Driven Temporal Graph Model and Subgraph Clustering Method

A Comprehensive Survey of Dynamic Graph Neural Networks: Models, Frameworks, Benchmarks, Experiments and Challenges

Taxonomy of Benchmarks in Graph Representation Learning

Temporal-Aware Evaluation and Learning for Temporal Graph Neural Networks