On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers

Cai Zhou,Rose Yu,Yusu Wang
2024-04-04
Abstract:Graph transformers have recently received significant attention in graph learning, partly due to their ability to capture more global interaction via self-attention. Nevertheless, while higher-order graph neural networks have been reasonably well studied, the exploration of extending graph transformers to higher-order variants is just starting. Both theoretical understanding and empirical results are limited. In this paper, we provide a systematic study of the theoretical expressive power of order-$k$ graph transformers and sparse variants. We first show that, an order-$k$ graph transformer without additional structural information is less expressive than the $k$-Weisfeiler Lehman ($k$-WL) test despite its high computational cost. We then explore strategies to both sparsify and enhance the higher-order graph transformers, aiming to improve both their efficiency and expressiveness. Indeed, sparsification based on neighborhood information can enhance the expressive power, as it provides additional information about input graph structures. In particular, we show that a natural neighborhood-based sparse order-$k$ transformer model is not only computationally efficient, but also expressive -- as expressive as $k$-WL test. We further study several other sparse graph attention models that are computationally efficient and provide their expressiveness analysis. Finally, we provide experimental results to show the effectiveness of the different sparsification strategies.
Machine Learning,Computational Geometry,General Topology
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper primarily explores the theoretical expressive power and design space of Higher-Order Graph Transformers. Specifically: 1. **Theoretical Analysis**: - Investigates the theoretical expressive power of Order-k Graph Transformers and their sparse variants. - Points out that Higher-Order Graph Transformers without additional structural information are strictly less powerful than the k-Weisfeiler Lehman (k-WL) test. - Proposes enhancing the expressive power of Higher-Order Graph Transformers by adding k-tuple indexing information, making them at least as powerful as the k-WL test. 2. **Efficiency Improvements**: - Explores methods to improve the efficiency of Higher-Order Graph Transformers while maintaining their strong expressive power. - Proposes several sparse Higher-Order Graph Transformer models and analyzes their time complexity and expressive power. - One particularly interesting method is based on the Neighbor Attention mechanism, which not only has higher computational efficiency but also expressive power comparable to k-WL. 3. **Sparse Variants**: - Studies sparse attention mechanisms based on neighbors, local neighbors, and virtual tuples. - The Neighbor Attention mechanism is similar to k-WL but with lower computational complexity. - The Local Neighbor Attention mechanism further enhances expressive power while maintaining low time complexity. - The Virtual Tuple Attention mechanism simplifies computation by introducing virtual tuples. 4. **Reduction of Input k-tuples**: - Explores methods to reduce computational complexity by decreasing the input k-tuples. - Includes methods such as Simplicial Complexes selection and random sampling. 5. **Experimental Validation**: - Conducts experiments on synthetic and real-world datasets to validate the effectiveness of different sparsification strategies. Through these studies, the paper aims to improve the theoretical foundation and practical performance of Higher-Order Graph Transformers, particularly in terms of computational efficiency and expressive power.