Graph Propagation Transformer for Graph Representation Learning

Zhe Chen,Hao Tan,Tao Wang,Tianrun Shen,Tong Lu,Qiuying Peng,Cheng Cheng,Yue Qi
2024-10-09
Abstract:This paper presents a novel transformer architecture for graph representation learning. The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks. Specifically, we propose a new attention mechanism called Graph Propagation Attention (GPA). It explicitly passes the information among nodes and edges in three ways, i.e. node-to-node, node-to-edge, and edge-to-node, which is essential for learning graph-structured data. On this basis, we design an effective transformer architecture named Graph Propagation Transformer (GPTrans) to further help learn graph data. We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance. The code will be released at <a class="link-external link-https" href="https://github.com/czczup/GPTrans" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing Transformer - based graph representation learning methods fail to fully consider the relationships between nodes and edges in the graph, and these methods are less efficient when processing graph data. Specifically: 1. **Failure to fully utilize the relationships between nodes and edges**: Existing Transformer - based methods (such as Graphormer) simply fuse the information of nodes and edges through position encoding, without explicitly using the complex relationships between nodes and edges in the graph structure. This limits the model's ability to understand graph data. 2. **Inefficient dual - FFN structure**: Some methods (such as Edge - augmented Graph Transformer, EGT) adopt a dual - path structure and introduce two feed - forward networks (FFN) in the Transformer block to update the embeddings of nodes and edges respectively. Although this design can handle edge information, it increases the amount of computation and leads to low model efficiency. To solve these problems, the author proposes a new Transformer architecture - **Graph Propagation Transformer (GPTrans)**. The key innovation of this model is the introduction of a new attention mechanism - **Graph Propagation Attention (GPA)**, which can explicitly transmit information in three directions: node - to - node, node - to - edge, and edge - to - node. This design not only improves the model's ability to learn graph - structured data, but also avoids maintaining the FFN module specifically used for edge embeddings, thereby improving computational efficiency. ### Working principle of the GPA module The GPA module explicitly constructs information propagation paths in the following three ways: - **Node - to - node**: Use the global self - attention mechanism to capture the relationships between nodes, and predict the layer - specific attention bias \(\phi\) from the edge embeddings through the parameter matrix \(W_{\text{reduce}}\) to enhance the flexibility of the attention graph \(A\). \[ A=\frac{QK^{T}}{\sqrt{d_{\text{head}}}}+\phi, \quad x'_{\text{node}} = \text{softmax}(A)V \] - **Node - to - edge**: Capture the similarity between node embeddings through the attention graph \(A\), and expand it to the same dimension as the edge embeddings through the learnable matrix \(W_{\text{expand}}\) to achieve high - order spatial interaction. \[ x'_{\text{edge}}=(A + \text{softmax}(A))W_{\text{expand}} \] - **Edge - to - node**: Perform element - wise multiplication on the updated edge embeddings \(x'_{\text{edge}}\) through the softmax function, align the dimensions of the edge embeddings and node embeddings through a fully - connected layer (FC), and finally add and fuse the two types of node embeddings. \[ x''_{\text{node}}=\text{FC}\left(\sum (x'_{\text{edge}}\cdot\text{softmax}(x'_{\text{edge}})), \text{dim} = 1\right) \] \[ x'''_{\text{node}}=(x'_{\text{node}}+x''_{\text{node}})W_O \] In this way, GPTrans can not only model the relationships between nodes and edges in the graph more effectively, but also significantly improve the computational efficiency of the model. ### Experimental results The experimental results show that GPTrans outperforms many existing Transformer - based graph models on multiple benchmark datasets. For example, on the PCQM4M and PCQM4Mv2 datasets, GPTrans achieves a lower mean absolute error (MAE). In addition, in molecular property prediction tasks (such as MolHIV and M)