Abstract:Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information. In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. STA seamlessly bridges the fully-attentional structure and the rooted subtree, with theoretical proof that STA approximates the global attention under extreme settings. By allowing direct computation of attention weights among multi-hop neighbors, STA mitigates the inherent problems in existing graph attention mechanisms. Further we devise an efficient form for STA by employing kernelized softmax, which yields a linear time complexity. Our resulting GNN architecture, the STAGNN, presents a simple yet performant STA-based graph neural network leveraging a hop-aware attention strategy. Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/SubTree-Attention.

What problem does this paper attempt to address?

### The Problem Addressed by the Paper This paper aims to address the limitations of the attention mechanism in Graph Neural Networks (GNNs). Specifically, existing local attention mechanisms face challenges in capturing long-range information, while global attention mechanisms fail to reflect the hierarchical neighborhood structure and struggle to capture fine-grained local information. To overcome these issues, the authors propose a new multi-hop graph attention mechanism called Subtree Attention (STA). ### Key Issues and Solutions 1. **Limitations of Local Attention Mechanisms**: - Local attention mechanisms can only focus on 1-hop neighbors, resulting in a limited receptive field. - Even by stacking multiple local attention layers to build deep models, these message-passing-based deep architectures still struggle to capture long-range dependencies due to issues like over-smoothing and over-squashing. 2. **Limitations of Global Attention Mechanisms**: - Although global attention mechanisms can capture long-range information, they fail to reflect the hierarchical neighborhood structure and cannot capture fine-grained local information, which is crucial in many real-world scenarios. ### Solutions To address the above issues, the authors propose the Subtree Attention (STA) mechanism. The main features of STA include: - **Multi-hop Attention**: Allows the root node to directly focus on distant neighbors within the subtree, thereby collecting information from the entire root subtree within a single layer. - **Avoiding Message-Passing Scheme Issues**: Compared to multi-layer local attention mechanisms, STA avoids problems like over-smoothing and over-squashing. - **Hierarchical Capture of Neighborhood Structure**: By enabling each node to focus on its own root subtree, STA can hierarchically capture the neighborhood structure. - **Efficient Algorithm**: By using kernelized softmax, STA achieves linear time complexity, avoiding the high power of storing the adjacency matrix. ### Experimental Results The authors demonstrate the superiority of the STA-based model (STAGNN) over existing graph transformers and mainstream GNNs through comprehensive evaluations on 10 node classification datasets. Additionally, STAGNN remains competitive even under extremely deep architectures, further validating its effectiveness. ### Conclusion The Subtree Attention mechanism (STA) effectively combines the advantages of both local and global attention, addressing the limitations of existing attention mechanisms in graph learning, and provides new insights for the design of Graph Neural Networks.

Tailoring Self-Attention for Graph via Rooted Subtrees

NGAT: Attention in Breadth and Depth Exploration for Semi-Supervised Graph Representation Learning

How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision

Global-local graph attention: unifying global and local attention for node classification

Attention-based graph neural networks: a survey

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

Understanding Attention and Generalization in Graph Neural Networks

SPAN: Subgraph Prediction Attention Network for Dynamic Graphs

Node Embedding and Classification with Adaptive Structural Fingerprint

Towards Deep Attention in Graph Neural Networks: Problems and Remedies

Adaptive Structural Fingerprints for Graph Attention Networks.

Not All Neighbors Are Worth Attending To: Graph Selective Attention Networks for Semi-supervised Learning

Transferable graph neural networks with deep alignment attention

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

Attention-based Graph Neural Network for Semi-supervised Learning

Revisiting Attention-Based Graph Neural Networks for Graph Classification

Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph Triple Attention Network: A Decoupled Perspective

An end-to-end attention-based approach for learning on graphs

Hybrid Focal and Full-Range Attention Based Graph Transformers

Deep Graph Attention Model