Tailoring Self-Attention for Graph via Rooted Subtrees

Siyuan Huang,Yunchong Song,Jiayue Zhou,Zhouhan Lin
DOI: https://doi.org/10.48550/arXiv.2310.05296
IF: 5.414
2023-10-08
Machine Learning
Abstract:Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information. In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. STA seamlessly bridges the fully-attentional structure and the rooted subtree, with theoretical proof that STA approximates the global attention under extreme settings. By allowing direct computation of attention weights among multi-hop neighbors, STA mitigates the inherent problems in existing graph attention mechanisms. Further we devise an efficient form for STA by employing kernelized softmax, which yields a linear time complexity. Our resulting GNN architecture, the STAGNN, presents a simple yet performant STA-based graph neural network leveraging a hop-aware attention strategy. Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/SubTree-Attention.
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to address the limitations of the attention mechanism in Graph Neural Networks (GNNs). Specifically, existing local attention mechanisms face challenges in capturing long-range information, while global attention mechanisms fail to reflect the hierarchical neighborhood structure and struggle to capture fine-grained local information. To overcome these issues, the authors propose a new multi-hop graph attention mechanism called Subtree Attention (STA). ### Key Issues and Solutions 1. **Limitations of Local Attention Mechanisms**: - Local attention mechanisms can only focus on 1-hop neighbors, resulting in a limited receptive field. - Even by stacking multiple local attention layers to build deep models, these message-passing-based deep architectures still struggle to capture long-range dependencies due to issues like over-smoothing and over-squashing. 2. **Limitations of Global Attention Mechanisms**: - Although global attention mechanisms can capture long-range information, they fail to reflect the hierarchical neighborhood structure and cannot capture fine-grained local information, which is crucial in many real-world scenarios. ### Solutions To address the above issues, the authors propose the Subtree Attention (STA) mechanism. The main features of STA include: - **Multi-hop Attention**: Allows the root node to directly focus on distant neighbors within the subtree, thereby collecting information from the entire root subtree within a single layer. - **Avoiding Message-Passing Scheme Issues**: Compared to multi-layer local attention mechanisms, STA avoids problems like over-smoothing and over-squashing. - **Hierarchical Capture of Neighborhood Structure**: By enabling each node to focus on its own root subtree, STA can hierarchically capture the neighborhood structure. - **Efficient Algorithm**: By using kernelized softmax, STA achieves linear time complexity, avoiding the high power of storing the adjacency matrix. ### Experimental Results The authors demonstrate the superiority of the STA-based model (STAGNN) over existing graph transformers and mainstream GNNs through comprehensive evaluations on 10 node classification datasets. Additionally, STAGNN remains competitive even under extremely deep architectures, further validating its effectiveness. ### Conclusion The Subtree Attention mechanism (STA) effectively combines the advantages of both local and global attention, addressing the limitations of existing attention mechanisms in graph learning, and provides new insights for the design of Graph Neural Networks.