Abstract:Transformers have achieved great success in several domains, including Natural Language Processing and Computer Vision. However, its application to real-world graphs is less explored, mainly due to its high computation cost and its poor generalizability caused by the lack of enough training data in the graph domain. To fill in this gap, we propose a scalable Transformer-like dynamic graph learning method named Dynamic Graph Transformer (DyFormer) with spatial-temporal encoding to effectively learn graph topology and capture implicit links. To achieve efficient and scalable training, we propose temporal-union graph structure and its associated subgraph-based node sampling strategy. To improve the generalization ability, we introduce two complementary self-supervised pre-training tasks and show that jointly optimizing the two pre-training tasks results in a smaller Bayesian error rate via an information-theoretic analysis. Extensive experiments on the real-world datasets illustrate that DyFormer achieves a consistent 1%-3% AUC gain (averaged over all time steps) compared with baselines on all benchmarks.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in dynamic graph learning. Specifically, these problems include: 1. **Missing or Spurious Links**: - In the real world, static graphs may be affected by missing or spurious links, causing message passing based on graph neural networks (GNNs) to become ineffective between unrelated neighbors. In dynamic graphs, this problem is more severe because GNNs cannot distinguish whether these links are caused by data missing or the dynamic evolution of the graph, which may lead to poor generalization ability. - To solve this problem, DyFormer utilizes the fully - connected self - attention mechanism of Transformer to model the relationships between all pairs of nodes, thus being robust to missing and spurious links. 2. **Scalability Issue**: - The size of dynamic graphs will increase over time, and the complexity of most static - graph GNNs depends on the size of the graph, making them not scalable on large - scale graphs. Moreover, dynamic graphs introduce an additional complexity dependence on the number of time steps, making the computational problem more serious. - To improve scalability, DyFormer proposes a new temporal - union graph structure, which aggregates the information of multiple time steps into a unified meta - graph, and develops a sub - graph - based node sampling strategy, making the complexity independent of the graph size and the number of time steps. 3. **Generalization Ability**: - Existing dynamic - graph algorithms usually process dynamic graphs by learning node representations on each static - graph snapshot and then aggregating these representations in the time dimension. However, these methods still have the above - mentioned missing or spurious links problems, and aggregating information in the time dimension may further propagate errors, affecting the accuracy of downstream tasks. - DyFormer introduces two complementary self - supervised pre - training tasks to improve the generalization ability of the model and its robustness to missing / spurious links. It is proved by information - theoretic analysis that these two tasks can reduce the Bayes error rate, thus improving the generalization performance. ### Main Contributions 1. **Two - tower Transformer - based Method**: - DyFormer with spatio - temporal encoding is proposed, which can capture implicit edge connections beyond the input graph topology. 2. **Self - supervised Pre - training Tasks**: - Two complementary pre - training tasks are introduced, and the benefits to generalization ability and robustness are proved by information theory. 3. **Temporal - union Graph Structure**: - An efficient temporal - union graph structure and a novel sampling strategy are proposed, making the complexity of DyFormer independent of the graph size and the number of time steps. 4. **Empirical Evaluation**: - A comprehensive experimental evaluation is carried out on real - world datasets, and the effectiveness of DyFormer is verified by ablation studies. Through these improvements, DyFormer can achieve better performance and higher generalization ability in dynamic graph learning.

DyFormer: A Scalable Dynamic Graph Transformer with Provable Benefits on Generalization Ability

GTA: Graph Transformer Adapter

DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning

Towards Better Dynamic Graph Learning: New Architecture and Unified Library

Unleashing the Power of Transformer for Graphs

On the Feasibility of Simple Transformer for Dynamic Graph Modeling

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

DyGraphformer: Transformer combining dynamic spatio-temporal graph network for multivariate time series forecasting

Transformers are efficient hierarchical chemical graph learners

Do Transformers Really Perform Badly for Graph Representation?

Graph Transformers: A Survey

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity

Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-Training on Industrial-Scale Data

Attending to Graph Transformers

Do Transformers Really Perform Bad for Graph Representation?

MeshFormer: High-resolution Mesh Segmentation with Graph Transformer

Supra-Laplacian Encoding for Transformer on Dynamic Graphs

Deformable Graph Transformer

Less is More: on the Over-Globalizing Problem in Graph Transformers

Graph Transformers for Large Graphs