Technical Report: The Graph Spectral Token -- Enhancing Graph Transformers with Spectral Information

Zihan Pengmei,Zimu Li
2024-04-08
Abstract:Graph Transformers have emerged as a powerful alternative to Message-Passing Graph Neural Networks (MP-GNNs) to address limitations such as over-squashing of information exchange. However, incorporating graph inductive bias into transformer architectures remains a significant challenge. In this report, we propose the Graph Spectral Token, a novel approach to directly encode graph spectral information, which captures the global structure of the graph, into the transformer architecture. By parameterizing the auxiliary [CLS] token and leaving other tokens representing graph nodes, our method seamlessly integrates spectral information into the learning process. We benchmark the effectiveness of our approach by enhancing two existing graph transformers, GraphTrans and SubFormer. The improved GraphTrans, dubbed GraphTrans-Spec, achieves over 10% improvements on large graph benchmark datasets while maintaining efficiency comparable to MP-GNNs. SubFormer-Spec demonstrates strong performance across various datasets.
Machine Learning
What problem does this paper attempt to address?
The paper presents a method called Graph Spectral Token to address the problem of integrating graph induction bias in Graph Transformers. While Graph Transformers perform well in compressing information and addressing the locality issue in Message Passing Graph Neural Networks (MP-GNNs), integrating global graph structure information remains challenging. By encoding graph spectral information into the [CLS] token of the Transformer architecture, this method can update both graph spectral and regular node features simultaneously, enhancing the model's expressive power. Experiments demonstrate that the enhanced GraphTrans-Spec and SubFormer-Spec achieve significant performance improvements on multiple molecular modeling datasets, particularly on large-scale graph datasets, demonstrating the effectiveness and efficiency of injecting graph spectral information.