Abstract:Graph transformers have recently received significant attention in graph learning, partly due to their ability to capture more global interaction via self-attention. Nevertheless, while higher-order graph neural networks have been reasonably well studied, the exploration of extending graph transformers to higher-order variants is just starting. Both theoretical understanding and empirical results are limited. In this paper, we provide a systematic study of the theoretical expressive power of order-$k$ graph transformers and sparse variants. We first show that, an order-$k$ graph transformer without additional structural information is less expressive than the $k$-Weisfeiler Lehman ($k$-WL) test despite its high computational cost. We then explore strategies to both sparsify and enhance the higher-order graph transformers, aiming to improve both their efficiency and expressiveness. Indeed, sparsification based on neighborhood information can enhance the expressive power, as it provides additional information about input graph structures. In particular, we show that a natural neighborhood-based sparse order-$k$ transformer model is not only computationally efficient, but also expressive -- as expressive as $k$-WL test. We further study several other sparse graph attention models that are computationally efficient and provide their expressiveness analysis. Finally, we provide experimental results to show the effectiveness of the different sparsification strategies.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper primarily explores the theoretical expressive power and design space of Higher-Order Graph Transformers. Specifically: 1. **Theoretical Analysis**: - Investigates the theoretical expressive power of Order-k Graph Transformers and their sparse variants. - Points out that Higher-Order Graph Transformers without additional structural information are strictly less powerful than the k-Weisfeiler Lehman (k-WL) test. - Proposes enhancing the expressive power of Higher-Order Graph Transformers by adding k-tuple indexing information, making them at least as powerful as the k-WL test. 2. **Efficiency Improvements**: - Explores methods to improve the efficiency of Higher-Order Graph Transformers while maintaining their strong expressive power. - Proposes several sparse Higher-Order Graph Transformer models and analyzes their time complexity and expressive power. - One particularly interesting method is based on the Neighbor Attention mechanism, which not only has higher computational efficiency but also expressive power comparable to k-WL. 3. **Sparse Variants**: - Studies sparse attention mechanisms based on neighbors, local neighbors, and virtual tuples. - The Neighbor Attention mechanism is similar to k-WL but with lower computational complexity. - The Local Neighbor Attention mechanism further enhances expressive power while maintaining low time complexity. - The Virtual Tuple Attention mechanism simplifies computation by introducing virtual tuples. 4. **Reduction of Input k-tuples**: - Explores methods to reduce computational complexity by decreasing the input k-tuples. - Includes methods such as Simplicial Complexes selection and random sampling. 5. **Experimental Validation**: - Conducts experiments on synthetic and real-world datasets to validate the effectiveness of different sparsification strategies. Through these studies, the paper aims to improve the theoretical foundation and practical performance of Higher-Order Graph Transformers, particularly in terms of computational efficiency and expressive power.

On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers

On Structural Expressive Power of Graph Transformers

Towards Principled Graph Transformers

Are More Layers Beneficial to Graph Transformers?

Less is More: on the Over-Globalizing Problem in Graph Transformers

Aligning Transformers with Weisfeiler-Leman

Unleashing the Power of Transformer for Graphs

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

Even Sparser Graph Transformers

Attending to Graph Transformers

Enhancing Graph Transformers with Hierarchical Distance Structural Encoding

Graph Transformers: A Survey

SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity

Transformer for Graphs: An Overview from Architecture Perspective

Towards Dynamic Graph Neural Networks with Provably High-Order Expressive Power

What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding

Less Is More -- On the Importance of Sparsification for Transformers and Graph Neural Networks for TSP

KDLGT: A Linear Graph Transformer Framework Via Kernel Decomposition Approach.

Representational Strengths and Limitations of Transformers

Transformers are efficient hierarchical chemical graph learners

When Transformer Meets Large Graphs: An Expressive and Efficient Two-View Architecture