PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting

Tongtong Zhang,Zhiyong Cui,Bingzhang Wang,Yilong Ren,Haiyang Yu,Pan Deng,Yinhai Wang
2024-12-18
Abstract:In urban computing, precise and swift forecasting of multivariate time series data from traffic networks is crucial. This data incorporates additional spatial contexts such as sensor placements and road network layouts, and exhibits complex temporal patterns that amplify challenges for predictive learning in traffic management, smart mobility demand, and urban planning. Consequently, there is an increasing need to forecast traffic flow across broader geographic regions and for higher temporal coverage. However, current research encounters limitations because of the inherent inefficiency of model and their unsuitability for large-scale traffic network applications due to model complexity. This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP). The PreMixer comprehensively consider temporal dependencies of traffic patterns in different time windows and processes the spatial dynamics as well. Additionally, we integrate spatio-temporal positional encoding to manage spatiotemporal heterogeneity without relying on predefined graphs. Furthermore, our innovative pre-training model uses a simple patch-wise MLP to conduct masked time series modeling, learning from long-term historical data segmented into patches to generate enriched contextual representations. This approach enhances the downstream forecasting model without incurring significant time consumption or computational resource demands owing to improved learning efficiency and data handling flexibility. Our framework achieves comparable state-of-the-art performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.
Machine Learning,Emerging Technologies
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in a large - scale transportation network, how to achieve accurate and efficient multi - variable time - series data prediction. Specifically, the paper focuses on the traffic flow prediction problem in urban computing, especially how to handle complex time patterns and spatial dynamics under a wider geographical area and higher time coverage while ensuring the efficiency and scalability of the model. ### Problem Background 1. **Complex Time Patterns and Spatial Dynamics** - Traffic data contains not only complex patterns in the time dimension (such as periodicity and trends) but also dynamic changes in the spatial dimension (such as sensor locations and road network layouts). These factors make the traffic prediction task more challenging. - The non - stationarity and complexity in large - scale transportation systems lead to complex long - term patterns in spatio - temporal data, such as periodicity and trends. 2. **Limitations of Existing Models** - Existing traffic prediction models, especially those based on graph neural networks (STGNNs) and Transformer methods, encounter problems of efficiency and scalability when dealing with large - scale transportation networks. These models usually require a large amount of computing resources, and as the number of nodes and the time coverage increase, the model complexity also rises sharply. - Directly inputting long - term spatio - temporal data into these models will result in overly long training and inference times, and optimizing the model also becomes more difficult. 3. **Lack of Effective Utilization of Long - Term Features** - Although some pre - trained models can enhance the performance of downstream tasks, they usually rely on complex architectures (such as Transformer), which increases the demand for time and computing resources, especially when deployed on large - scale transportation networks. - Existing methods often overlook features within a long - time span, which limits the model's ability to learn long - term patterns, thus affecting the prediction performance. ### Solution To solve the above problems, the paper proposes a new framework named PreMixer. The main features of this framework include: 1. **MLP - Based Prediction Model** - PreMixer uses MLP - Mixer as the basic architecture and captures the time and spatial information of the input data through interleaved MLP layers. This architecture is simple and efficient and can handle large - scale traffic data. 2. **Spatio - Temporal Position Encoding (STPE)** - Spatio - temporal position encoding is introduced to encode time and spatial position information simultaneously without relying on a predefined graph structure. This helps the model obtain additional context information while significantly reducing the computational complexity. 3. **MLP - Based Pre - trained Model (PIEncoder)** - A simple MLP - based pre - trained model PIEncoder is designed to learn useful representations from long - term historical data. This model improves the learning efficiency and reduces the demand for computing resources by splitting the time - series data into multiple segments and independently embedding each segment. - PIEncoder adopts a masked auto - encoding strategy and generates context - rich representations by reconstructing the masked segments, enhancing the ability of the downstream prediction model. 4. **Contrastive Learning** - Complementary Contrastive Learning (CL) is utilized to further enhance the segment - level time - series representations. By generating positive sample pairs of different views, CL can effectively capture time - dependencies and dynamic changes, improving the generalization and discrimination ability of the model. ### Summary By introducing the PreMixer framework, the paper aims to solve the efficiency and scalability problems in large - scale traffic prediction, while fully utilizing long - term spatio - temporal features to improve the prediction accuracy. The experimental results show that PreMixer achieves performance comparable to or even better than existing advanced methods on multiple large - scale traffic datasets while maintaining high computational efficiency.