Learning Graph Quantized Tokenizers for Transformers

Limei Wang,Kaveh Hassani,Si Zhang,Dongqi Fu,Baichuan Yuan,Weilin Cong,Zhigang Hua,Hao Wu,Ning Yao,Bo Long
2024-10-18
Abstract:Transformers serve as the backbone architectures of Foundational Models, where a domain-specific tokenizer helps them adapt to various domains. Graph Transformers (GTs) have recently emerged as a leading model in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities, with existing approaches relying on heuristics or GNNs co-trained with Transformers. To address this, we introduce GQT (\textbf{G}raph \textbf{Q}uantized \textbf{T}okenizer), which decouples tokenizer training from Transformer training by leveraging multi-task graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 16 out of 18 benchmarks, including large-scale homophilic and heterophilic datasets. The code is available at: <a class="link-external link-https" href="https://github.com/limei0307/graph-tokenizer" rel="external noopener nofollow">this https URL</a>
Neural and Evolutionary Computing,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: The development of existing tokenizers for graph - structured data lags behind that of data in other modalities and depends on heuristic methods or graph neural networks (GNNs) co - trained with Transformers. This restricts the performance of graph Transformers (GTs) when handling graph - learning tasks, especially on large - scale and heterogeneous graphs. Therefore, this paper proposes a new Graph Quantized Tokenizer (GQT), aiming to decouple the tokenizer training from the Transformer training through multi - task graph self - supervised learning, thereby generating robust and general graph tokenization. Specifically, the paper addresses the following key issues: 1. **Limitations of existing tokenizers**: Existing graph tokenizers depend on heuristic methods or GNNs co - trained with Transformers, which leads to limited performance of the tokenizers and inability to fully exploit the potential of Transformers. 2. **Decoupling of tokenizers and Transformers**: In order to improve the generalization ability and efficiency of tokenizers, the paper proposes to train the tokenizers through multi - task graph self - supervised learning, enabling them to be trained independently of Transformers, thereby generating more robust tokenization. 3. **Reduction of memory requirements**: By introducing Residual Vector Quantization (RVQ), GQT can learn hierarchical discrete tokenization, significantly reducing memory requirements and improving generalization ability. 4. **Enhancement of long - range dependency capture**: By combining semantic edges and random walk techniques, GQT enables Transformers to better capture long - range dependency relationships in graphs, thereby achieving better performance in various graph - learning tasks. Through these improvements, GQT has achieved state - of - the - art performance on 18 datasets in 16 benchmark tests, especially performing excellently on large - scale homogeneous and heterogeneous graph datasets.