Abstract:Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the scale of millions or even billions of nodes. With large-scale graphs, global attention learning is proven impractical due to its quadratic complexity w.r.t. the number of nodes. On the other hand, neighborhood sampling techniques become essential to manage large graph sizes, yet finding the optimal trade-off between speed and accuracy with sampling techniques remains challenging. This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints for developing scalable graph transformer (GT) architectures. We argue such GT requires layers that can adeptly learn both local and global graph representations while swiftly sampling the graph topology. As such, a key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism that encompasses a 4-hop reception field, but achieved through just 2-hop operations. This local node embedding is then integrated with a global node embedding, acquired via another self-attention layer with an approximate global codebook, before finally sent through a downstream layer for node predictions. The proposed GT framework, named LargeGT, overcomes previous computational bottlenecks and is validated on three large-scale node classification benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9% performance improvement.

Deep Reinforcement Learning for Large-Scale TSP Graph

The Transformer Network for the Traveling Salesman Problem

Short-Term Speed Forecasting of Large-Scale Urban Road Network Based on Transformer

A Deep Reinforcement Learning Based Real-Time Solution Policy for the Traveling Salesman Problem

Memory-efficient Transformer-based network model for Traveling Salesman Problem

Solving Optimization Problems Through Fully Convolutional Networks: an Application to the Traveling Salesman Problem

Less Is More -- On the Importance of Sparsification for Transformers and Graph Neural Networks for TSP

HiTSP: Towards a Hierarchical Neural Framework for Large-scale Traveling Salesman Problems

A lightweight CNN-transformer model for learning traveling salesman problems

Neural Combinatorial Optimization with Reinforcement Learning

A hierarchical deep reinforcement learning method for solving urban route planning problems under large-scale customers and real-time traffic conditions

An Efficient Hybrid Graph Network Model for Traveling Salesman Problem with Drone

Improving Generalization of Deep Reinforcement Learning-based TSP Solvers

Generalization in Deep RL for TSP Problems via Equivariance and Local Search

Hierarchical Neural Constructive Solver for Real-world TSP Scenarios

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

Generalize a Small Pre-trained Model to Arbitrarily Large TSP Instances

Learning dynamic and hierarchical traffic spatiotemporal features with Transformer

Graph Transformers for Large Graphs

Neural TSP Solver with Progressive Distillation.