Abstract:Transformers serve as the backbone architectures of Foundational Models, where a domain-specific tokenizer helps them adapt to various domains. Graph Transformers (GTs) have recently emerged as a leading model in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities, with existing approaches relying on heuristics or GNNs co-trained with Transformers. To address this, we introduce GQT (\textbf{G}raph \textbf{Q}uantized \textbf{T}okenizer), which decouples tokenizer training from Transformer training by leveraging multi-task graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 16 out of 18 benchmarks, including large-scale homophilic and heterophilic datasets. The code is available at: <a class="link-external link-https" href="https://github.com/limei0307/graph-tokenizer" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problems that this paper attempts to solve are as follows: The development of existing tokenizers for graph - structured data lags behind that of data in other modalities and depends on heuristic methods or graph neural networks (GNNs) co - trained with Transformers. This restricts the performance of graph Transformers (GTs) when handling graph - learning tasks, especially on large - scale and heterogeneous graphs. Therefore, this paper proposes a new Graph Quantized Tokenizer (GQT), aiming to decouple the tokenizer training from the Transformer training through multi - task graph self - supervised learning, thereby generating robust and general graph tokenization. Specifically, the paper addresses the following key issues: 1. **Limitations of existing tokenizers**: Existing graph tokenizers depend on heuristic methods or GNNs co - trained with Transformers, which leads to limited performance of the tokenizers and inability to fully exploit the potential of Transformers. 2. **Decoupling of tokenizers and Transformers**: In order to improve the generalization ability and efficiency of tokenizers, the paper proposes to train the tokenizers through multi - task graph self - supervised learning, enabling them to be trained independently of Transformers, thereby generating more robust tokenization. 3. **Reduction of memory requirements**: By introducing Residual Vector Quantization (RVQ), GQT can learn hierarchical discrete tokenization, significantly reducing memory requirements and improving generalization ability. 4. **Enhancement of long - range dependency capture**: By combining semantic edges and random walk techniques, GQT enables Transformers to better capture long - range dependency relationships in graphs, thereby achieving better performance in various graph - learning tasks. Through these improvements, GQT has achieved state - of - the - art performance on 18 datasets in 16 benchmark tests, especially performing excellently on large - scale homogeneous and heterogeneous graph datasets.

Learning Graph Quantized Tokenizers for Transformers

GQWformer: A Quantum-based Transformer for Graph Representation Learning

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets

Technical Report: The Graph Spectral Token -- Enhancing Graph Transformers with Spectral Information

Enhancing Graph Neural Networks with Quantum Computed Encodings

TorchGT: A Holistic System for Large-scale Graph Transformer Training

KDLGT: A Linear Graph Transformer Framework Via Kernel Decomposition Approach.

Retrofitting Temporal Graph Neural Networks with Transformer

Graph Transformers for Large Graphs

A General and Efficient Training for Transformer via Token Expansion

Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

HHGT: Hierarchical Heterogeneous Graph Transformer for Heterogeneous Graph Representation Learning

Tokenized Graph Transformer with Neighborhood Augmentation for Node Classification in Large Graphs

Quantum linear algebra is all you need for Transformer architectures

DHIL-GT: Scalable Graph Transformer with Decoupled Hierarchy Labeling

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

Hierarchical Transformer for Scalable Graph Learning

Efficient Video Transformers with Spatial-Temporal Token Selection