Abstract:Although Transformer architectures have been successfully applied to graph data with the advent of Graph Transformer, current design of Graph Transformer still heavily relies on human labor and expertise knowledge to decide proper neural architectures and suitable graph encoding strategies at each Transformer layer. In literature, there have been some works on automated design of Transformers focusing on non-graph data such as texts and images without considering graph encoding strategies, which fail to handle the non-euclidean graph data. In this paper, we study the problem of automated graph Transformer, for the first time. However, solving these problems poses the following challenges: i) how can we design a unified search space for graph Transformer, and ii) how to deal with the coupling relations between Transformer architectures and the graph encodings of each Transformer layer. To address these challenges, we propose Automated Graph Transformer (AutoGT), a neural architecture search framework that can automatically discover the optimal graph Transformer architectures by joint optimization of Transformer architecture and graph encoding strategies. Specifically, we first propose a unified graph Transformer formulation that can represent most of state-of-the-art graph Transformer architectures. Based upon the unified formulation, we further design the graph Transformer search space that includes both candidate architectures and various graph encodings. To handle the coupling relations, we propose a novel encoding-aware performance estimation strategy by gradually training and splitting the supernets according to the correlations between graph encodings and architectures. The proposed strategy can provide a more consistent and fine-grained performance prediction when evaluating the jointly optimized graph encodings and architectures. Extensive experiments and ablation studies show that our proposed AutoGT gains sufficient improvement over state-of-the-art hand-crafted baselines on all datasets, demonstrating its effectiveness and wide applicability.

AutoGTCO: Graph and Tensor Co-Optimize for Image Recognition with Transformers on GPU

GTCO: Graph and Tensor Co-Design for Transformer-Based Image Recognition on Tensor Cores

TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation

AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution.

A Scalable and Effective Alternative to Graph Transformers

Boost Vision Transformer with GPU-Friendly Sparsity and Quantization

AutoGT: Automated Graph Transformer Architecture Search

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Training Strategies for Vision Transformers for Object Detection

TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Galvatron

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Graph neural networks with configuration cross-attention for tensor compilers

TRT-ViT: TensorRT-oriented Vision Transformer

GTC: GNN-Transformer Co-contrastive Learning for Self-supervised Heterogeneous Graph Representation

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers

ETO: Accelerating Optimization of DNN Operators by High-Performance Tensor Program Reuse

A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations

TOCO: A Systolic Network for Efficient Transposed Convolutions with Output-Reuse Paths.