Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

Peng Chen,Yingying Zhang,Yunyao Cheng,Yang Shu,Yihang Wang,Qingsong Wen,Bin Yang,Chenjuan Guo
2024-09-15
Abstract:Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different temporal resolutions using patches of various sizes. Based on the division of each scale, dual attention is performed over these patches to capture global correlations and local details as temporal dependencies. We further enrich the multi-scale Transformer with adaptive pathways, which adaptively adjust the multi-scale modeling process based on the varying temporal dynamics of the input, improving the accuracy and generalization of Pathformer. Extensive experiments on eleven real-world datasets demonstrate that Pathformer not only achieves state-of-the-art performance by surpassing all current models but also exhibits stronger generalization abilities under various transfer scenarios. The code is made available at <a class="link-external link-https" href="https://github.com/decisionintelligence/pathformer" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in time - series prediction, existing Transformer models mainly model time - series from limited or fixed scales and it is difficult to capture features across different scales. Specifically, the paper points out that two main challenges limit the multi - scale modeling effect of Transformer in time - series prediction: 1. **Incompleteness of multi - scale modeling**: Merely changing the time resolution cannot explicitly and efficiently emphasize time - dependent relationships in different ranges. Although considering different time distances can model dependent relationships in different ranges (such as global and local correlations), these distances are affected by data partitioning and are incomplete from the perspective of a single time - resolution. 2. **Fixed multi - scale modeling process**: Different time - series may prefer different scales according to their specific time characteristics and dynamics. The fixed multi - scale modeling method uses the same scale for all data, which hinders the grasping of the key patterns of each time - series. Manual tuning of the optimal scale is both time - consuming and difficult to achieve. To solve these problems, the paper proposes **Pathformer**, which is a multi - scale Transformer model with an adaptive path. Pathformer addresses the above challenges in the following ways: - **Multi - scale Transformer block**: It combines the two perspectives of time resolution and time distance. Through multi - scale partitioning, the time - series is divided into fragments of different sizes, and a dual - attention mechanism (including intra - slice attention and inter - slice attention) is carried out based on the partitioning of each scale to capture time - dependent relationships. - **Adaptive path**: A multi - scale router is introduced to adaptively select specific fragment partition sizes and subsequent dual - attention mechanisms according to the input data, controlling the extraction of multi - scale features. The router works in coordination with the aggregator to combine multi - scale features through weighted aggregation. Through these designs, Pathformer not only achieves state - of - the - art prediction performance on multiple real - world datasets but also shows stronger generalization ability in various transfer scenarios.