Abstract:A current goal in the graph neural network literature is to enable transformers to operate on graph-structured data, given their success on language and vision tasks. Since the transformer's original sinusoidal positional encodings (PEs) are not applicable to graphs, recent work has focused on developing graph PEs, rooted in spectral graph theory or various spatial features of a graph. In this work, we introduce a new graph PE, Graph Automaton PE (GAPE), based on weighted graph-walking automata (a novel extension of graph-walking automata). We compare the performance of GAPE with other PE schemes on both machine translation and graph-structured tasks, and we show that it generalizes several other PEs. An additional contribution of this study is a theoretical and controlled experimental comparison of many recent PEs in graph transformers, independent of the use of edge features.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: how to design appropriate position encoding (Position Encoding, PE) for graph - structured data, so that the Transformer model can effectively process graph data. Specifically, the paper proposes a new position encoding method - Graph Automaton Position Encoding (GAPE) based on the weighted graph - walking automaton, and explores its relationship and performance comparison with other existing position encoding methods. ### Problem Background The Transformer model has achieved great success in the fields of natural language processing and computer vision, but its original position encoding (such as sinusoidal position encoding) is not suitable for graph - structured data. Therefore, researchers have been exploring how to design appropriate position encoding for graph data in order to apply the Transformer model to graph neural networks (GNNs). Existing graph position encoding methods are mainly divided into two categories: 1. **Spectral methods**: Based on the eigenvalues and eigenvectors of the graph Laplacian matrix. 2. **Spatial methods**: Based on the local node features of the graph, such as node degree, shortest - path distance, etc. However, these methods have their own limitations and lack a unified framework to connect different position encoding methods. ### Paper Contributions 1. **Introduction of GAPE**: The paper proposes a new position encoding method - GAPE, which is based on the Weighted Graph - Walking Automata (WGWA). GAPE can not only theoretically simulate sinusoidal position encoding, but also establish a mathematical connection with other graph position encoding methods. 2. **Theoretical and experimental comparison**: The paper conducts theoretical analysis and controlled experimental comparison of several recently proposed graph position encoding methods, independent of the use of edge features. This allows researchers to more clearly understand the advantages and disadvantages of different methods. 3. **Performance verification**: Through experiments in machine translation tasks and multiple graph - structured tasks, the paper verifies the effectiveness and generalization ability of GAPE. GAPE almost reaches the same BLEU score as the original sinusoidal position encoding in the machine translation task and performs well in multiple graph - and node - level tasks. 4. **Distributed representation**: GAPE can provide distributed representations for any dimension without relying on the size of the graph or specific feature selection. ### Formula Representation The core formulas of GAPE are as follows: - Define the weighted graph - walking automaton \(M=(Q, S, \alpha, \mu, \tau)\), where: - \(Q\) is the set of states. - \(S\subseteq Q\) is the set of starting states. - \(\alpha\in\mathbb{R}^{k\times m}\) is the initial weight matrix. - \(\mu:\Sigma\rightarrow\mathbb{R}^{k\times k}\) is the mapping from labels to transition weight matrices. - \(\tau\in\mathbb{R}^{k\times m}\) is the final weight matrix. - For graph \(G\) and node \(v\), the calculation formula of GAPE is: \[ P_{r,v}=\sum_{(q_1, v_1),\ldots,(q_T, v_T)\in R_{r,v}}\alpha\ell(q_1, v_1)\prod_{t = 1}^{T - 1}\mu(q_t, q_{t+1}) \] where \(R_{r,v}\) is all runs starting from any configuration and ending with configuration \((r, v)\). - The final position encoding is: \[ \text{GAPE}(M(v))=P_{:,v}\odot\tau\ell \] Through these formulas, GAPE can generate effective, distributed node position encoding on graph - structured data, thereby improving the performance of the Transformer model in graph tasks.

Bridging Graph Position Encodings for Transformers with Weighted Graph-Walking Automata

GTA: Graph Transformer Adapter

Graph Transformers without Positional Encodings

Comparing Graph Transformers via Positional Encodings

What Are Good Positional Encodings for Directed Graphs?

HyPE-GT: where Graph Transformers meet Hyperbolic Positional Encodings

On Structural Expressive Power of Graph Transformers

Rethinking Structural Encodings: Adaptive Graph Transformer for Node Classification Task.

Structural and positional ensembled encoding for Graph Transformer

GRPE: Relative Positional Encoding for Graph Transformer

Reach the Remote Neighbors: Dual-Encoding Transformer for Graphs

Transformers Meet Directed Graphs

Graph Neural Networks with Learnable Structural and Positional Representations

Unleashing the Power of Transformer for Graphs

Towards Principled Graph Transformers

Attending to Graph Transformers

AutoGT: Automated Graph Transformer Architecture Search

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

The Impact of Positional Encoding on Length Generalization in Transformers

A Generalization of Transformer Networks to Graphs

Recipe for a General, Powerful, Scalable Graph Transformer