Abstract:Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the scale of millions or even billions of nodes. With large-scale graphs, global attention learning is proven impractical due to its quadratic complexity w.r.t. the number of nodes. On the other hand, neighborhood sampling techniques become essential to manage large graph sizes, yet finding the optimal trade-off between speed and accuracy with sampling techniques remains challenging. This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints for developing scalable graph transformer (GT) architectures. We argue such GT requires layers that can adeptly learn both local and global graph representations while swiftly sampling the graph topology. As such, a key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism that encompasses a 4-hop reception field, but achieved through just 2-hop operations. This local node embedding is then integrated with a global node embedding, acquired via another self-attention layer with an approximate global codebook, before finally sent through a downstream layer for node predictions. The proposed GT framework, named LargeGT, overcomes previous computational bottlenecks and is validated on three large-scale node classification benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9% performance improvement.

SpikeGraphormer: A High-Performance Graph Transformer with Spiking Graph Attention

SGHormer: An Energy-Saving Graph Transformer Driven by Spikes

Spikeformer: Training high-performance spiking neural network with transformer

Spiking Transformer with Spatial-Temporal Attention

Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training

Spikformer: When Spiking Neural Network Meets Transformer

Spiking GATs: Learning Graph Attentions via Spiking Neural Network

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

Combining Aggregated Attention and Transformer Architecture for Accurate and Efficient Performance of Spiking Neural Networks

Graph Transformers for Large Graphs

Sparse Graph Transformer with Contrastive Learning

Spiking Graph Convolutional Networks

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

Exploiting Spiking Dynamics with Spatial-temporal Feature Normalization in Graph Learning

TANGNN: a Concise, Scalable and Effective Graph Neural Networks with Top-m Attention Mechanism for Graph Representation Learning

Breaking the Bottleneck on Graphs with Structured State Spaces

Enhancing Graph Representation Learning with Attention-Driven Spiking Neural Networks

SGLFormer: Spiking Global-Local-Fusion Transformer with high performance

Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks