Abstract:Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the scale of millions or even billions of nodes. With large-scale graphs, global attention learning is proven impractical due to its quadratic complexity w.r.t. the number of nodes. On the other hand, neighborhood sampling techniques become essential to manage large graph sizes, yet finding the optimal trade-off between speed and accuracy with sampling techniques remains challenging. This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints for developing scalable graph transformer (GT) architectures. We argue such GT requires layers that can adeptly learn both local and global graph representations while swiftly sampling the graph topology. As such, a key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism that encompasses a 4-hop reception field, but achieved through just 2-hop operations. This local node embedding is then integrated with a global node embedding, acquired via another self-attention layer with an approximate global codebook, before finally sent through a downstream layer for node predictions. The proposed GT framework, named LargeGT, overcomes previous computational bottlenecks and is validated on three large-scale node classification benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9% performance improvement.

Large-scale graph representation learning with very deep GNNs and self-supervision

DiscoGNN: A Sample-Efficient Framework for Self-Supervised Graph Representation Learning

NGAT: Attention in Breadth and Depth Exploration for Semi-Supervised Graph Representation Learning

OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs

A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Large Generative Graph Models

LazyGNN: Large-Scale Graph Neural Networks via Lazy Propagation

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

GraphGPT: Graph Instruction Tuning for Large Language Models

Graph Attention Multi-Layer Perceptron

Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

A Simple and Scalable Graph Neural Network for Large Directed Graphs

IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research

Scalable and Efficient Full-Graph GNN Training for Large Graphs

LiGNN: Graph Neural Networks at LinkedIn

Evaluating Deep Graph Neural Networks

On the Scalability of GNNs for Molecular Graphs

Graph Transformers for Large Graphs

SIGN: Scalable Inception Graph Neural Networks