Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective

Yuchen Fang,Yuxuan Liang,Bo Hui,Zezhi Shao,Liwei Deng,Xu Liu,Xinke Jiang,Kai Zheng
2024-12-13
Abstract:Road traffic forecasting is crucial in real-world intelligent transportation scenarios like traffic dispatching and path planning in city management and personal traveling. Spatio-temporal graph neural networks (STGNNs) stand out as the mainstream solution in this task. Nevertheless, the quadratic complexity of remarkable dynamic spatial modeling-based STGNNs has become the bottleneck over large-scale traffic data. From the spatial data management perspective, we present a novel Transformer framework called PatchSTG to efficiently and dynamically model spatial dependencies for large-scale traffic forecasting with interpretability and fidelity. Specifically, we design a novel irregular spatial patching to reduce the number of points involved in the dynamic calculation of Transformer. The irregular spatial patching first utilizes the leaf K-dimensional tree (KDTree) to recursively partition irregularly distributed traffic points into leaf nodes with a small capacity, and then merges leaf nodes belonging to the same subtree into occupancy-equaled and non-overlapped patches through padding and backtracking. Based on the patched data, depth and breadth attention are used interchangeably in the encoder to dynamically learn local and global spatial knowledge from points in a patch and points with the same index of patches. Experimental results on four real world large-scale traffic datasets show that our PatchSTG achieves train speed and memory utilization improvements up to $10\times$ and $4\times$ with the state-of-the-art performance.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the computational efficiency and accuracy in large - scale traffic flow prediction. Specifically, existing methods based on Spatio - Temporal Graph Neural Networks (STGNNs) have the following bottlenecks when dealing with large - scale traffic data: 1. **High computational complexity**: Dynamic spatial modeling methods usually have a quadratic complexity of \(O(N^2d)\), which makes them very inefficient when dealing with city - level traffic points (thousands or tens of thousands). 2. **Lack of interpretability and fidelity**: Some linear or low - rank dynamic spatial modeling methods reduce the computational complexity but sacrifice the interpretability and fidelity of the model, resulting in a decline in performance. To solve these problems, the author proposes a novel Transformer framework named PatchSTG. This framework improves the efficiency and accuracy of large - scale traffic flow prediction in the following ways: - **Irregular Spatial Patching**: Use KDTree to recursively divide irregularly distributed traffic points into small areas, and generate occupancy - balanced and non - overlapping patches through padding and backtracking, thereby reducing the number of points participating in dynamic calculations. - **Depth and Breadth Attention**: Alternately use depth attention and breadth attention on the patched data to learn local and global spatial knowledge respectively. Through these innovations, PatchSTG not only achieves state - of - the - art performance on four real - world large - scale traffic datasets but also improves the training speed and memory utilization by 10 times and 4 times respectively. ### Formula Summary - **Spatial Patching**: \[ \tilde{i}dx = BFS(LKDT(Lat, Lng, C)) \] \[ i dx, \bar{i}dx = Query(LKDT(Lat, Lng, C), CosSim(X, X^T)) \] \[ \bar{X} = Pad(i dx, \bar{i}dx, \tilde{X}_{\tilde{i}dx}) \] - **Depth Attention**: \[ Q_i^{(l)} = W_Q^{(l)} X^{(l - 1)} \] \[ K_i^{(l)} = W_K^{(l)} X^{(l - 1)} \] \[ V_i^{(l)} = W_V^{(l)} X^{(l - 1)} \] \[ A_i^{(l)} = Softmax\left(\frac{Q_i^{(l)} {K_i^{(l)}}^T}{\sqrt{d/o}}\right) V_i^{(l)} \] \[ \tilde{X}^{(l)} = (A_1^{(l)} \| \ldots \| A_o^{(l)}) W_O^{(l)} \] - **Breadth Attention**: Similar to depth attention, but at the patch level. - **Projection Decoder**: \[ \hat{Y} = W_D \tilde{Y} + b_D \] Through these formulas and methods, PatchSTG effectively solves the computational complexity and model performance problems in large - scale traffic flow prediction.