Abstract:Road traffic forecasting is crucial in real-world intelligent transportation scenarios like traffic dispatching and path planning in city management and personal traveling. Spatio-temporal graph neural networks (STGNNs) stand out as the mainstream solution in this task. Nevertheless, the quadratic complexity of remarkable dynamic spatial modeling-based STGNNs has become the bottleneck over large-scale traffic data. From the spatial data management perspective, we present a novel Transformer framework called PatchSTG to efficiently and dynamically model spatial dependencies for large-scale traffic forecasting with interpretability and fidelity. Specifically, we design a novel irregular spatial patching to reduce the number of points involved in the dynamic calculation of Transformer. The irregular spatial patching first utilizes the leaf K-dimensional tree (KDTree) to recursively partition irregularly distributed traffic points into leaf nodes with a small capacity, and then merges leaf nodes belonging to the same subtree into occupancy-equaled and non-overlapped patches through padding and backtracking. Based on the patched data, depth and breadth attention are used interchangeably in the encoder to dynamically learn local and global spatial knowledge from points in a patch and points with the same index of patches. Experimental results on four real world large-scale traffic datasets show that our PatchSTG achieves train speed and memory utilization improvements up to $10\times$ and $4\times$ with the state-of-the-art performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the computational efficiency and accuracy in large - scale traffic flow prediction. Specifically, existing methods based on Spatio - Temporal Graph Neural Networks (STGNNs) have the following bottlenecks when dealing with large - scale traffic data: 1. **High computational complexity**: Dynamic spatial modeling methods usually have a quadratic complexity of $O(N^2d)$, which makes them very inefficient when dealing with city - level traffic points (thousands or tens of thousands). 2. **Lack of interpretability and fidelity**: Some linear or low - rank dynamic spatial modeling methods reduce the computational complexity but sacrifice the interpretability and fidelity of the model, resulting in a decline in performance. To solve these problems, the author proposes a novel Transformer framework named PatchSTG. This framework improves the efficiency and accuracy of large - scale traffic flow prediction in the following ways: - **Irregular Spatial Patching**: Use KDTree to recursively divide irregularly distributed traffic points into small areas, and generate occupancy - balanced and non - overlapping patches through padding and backtracking, thereby reducing the number of points participating in dynamic calculations. - **Depth and Breadth Attention**: Alternately use depth attention and breadth attention on the patched data to learn local and global spatial knowledge respectively. Through these innovations, PatchSTG not only achieves state - of - the - art performance on four real - world large - scale traffic datasets but also improves the training speed and memory utilization by 10 times and 4 times respectively. ### Formula Summary - **Spatial Patching**: \[ \tilde{i}dx = BFS(LKDT(Lat, Lng, C)) \] \[ i dx, \bar{i}dx = Query(LKDT(Lat, Lng, C), CosSim(X, X^T)) \] \[ \bar{X} = Pad(i dx, \bar{i}dx, \tilde{X}_{\tilde{i}dx}) \] - **Depth Attention**: \[ Q_i^{(l)} = W_Q^{(l)} X^{(l - 1)} \] \[ K_i^{(l)} = W_K^{(l)} X^{(l - 1)} \] \[ V_i^{(l)} = W_V^{(l)} X^{(l - 1)} \] \[ A_i^{(l)} = Softmax\left(\frac{Q_i^{(l)} {K_i^{(l)}}^T}{\sqrt{d/o}}\right) V_i^{(l)} \] \[ \tilde{X}^{(l)} = (A_1^{(l)} \| \ldots \| A_o^{(l)}) W_O^{(l)} \] - **Breadth Attention**: Similar to depth attention, but at the patch level. - **Projection Decoder**: \[ \hat{Y} = W_D \tilde{Y} + b_D \] Through these formulas and methods, PatchSTG effectively solves the computational complexity and model performance problems in large - scale traffic flow prediction.

Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective

Short-Term Speed Forecasting of Large-Scale Urban Road Network Based on Transformer

Spatial–Temporal Deep Tensor Neural Networks for Large-Scale Urban Network Speed Prediction

STGformer: Efficient Spatiotemporal Graph Transformer for Traffic Forecasting

A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting

Spatiotemporal Fusion Transformer for large-scale traffic forecasting

TST-Trans: A Transformer Network for Urban Traffic Flow Prediction

Transformer network with decoupled spatial–temporal embedding for traffic flow forecasting

Dynamic spatial aware graph transformer for spatiotemporal traffic flow forecasting

Adaptive Spatiotemporal Transformer Graph Network for Traffic Flow Forecasting by IoT Loop Detectors

Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting

Learning dynamic and hierarchical traffic spatiotemporal features with Transformer

Graph Spatial-Temporal Transformer Network for Traffic Prediction

Spatial–Temporal Transformer Networks for Traffic Flow Forecasting Using a Pre-Trained Language Model

A spatial‐temporal graph gated transformer for traffic forecasting

Adaptive Graph Spatial-Temporal Transformer Network for Traffic Flow Forecasting

Spatio-Temporal Parallel Transformer based model for Traffic Prediction

STTF: An Efficient Transformer Model for Traffic Congestion Prediction

Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting

ASSTFormer: Adaptive Sparse Spatial-Temporal Transformer for Effective Traffic Forecasting

STA-former: encoding traffic flows with spatio-temporal associations in transformer networks for prediction