Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

Yong Liu,Haixu Wu,Jianmin Wang,Mingsheng Long
2023-11-24
Abstract:Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to attenuate the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Concretely, Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability. To address the over-stationarization problem, De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from raw series. Our Non-stationary Transformers framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer, making them the state-of-the-art in time series forecasting. Code is available at this repository: <a class="link-external link-https" href="https://github.com/thuml/Nonstationary_Transformers" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Signal Processing
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores the issue of non-stationarity in time series forecasting and proposes a new framework—Non-stationary Transformers (NST)—to tackle this challenge. 1. **Existing Problems**: - Transformers exhibit strong capabilities in time series forecasting, but their performance significantly drops on real-world data due to non-stationarity. - Non-stationary time series are characterized by statistical properties that change over time, making prediction difficult. - Traditional preprocessing methods (such as stationarization) can reduce non-stationarity but often lead to the loss of important information in the original time series, thereby affecting prediction accuracy. 2. **Key Challenges**: - Over-stationarization problem: After stationarizing the time series, the attention mechanism learned by the transformer becomes less distinctive, failing to capture unique dependencies between different sequences. - How to improve prediction accuracy while preserving the non-stationary characteristics of the original data remains a pressing issue. 3. **Solution**: - Propose the Non-stationary Transformers framework, which includes two modules: Series Stationarization and De-stationary Attention. - **Series Stationarization**: Unifies the statistical properties of the input data through normalization, enhancing predictability. - **De-stationary Attention**: Restores the non-stationary information in the original data, avoiding the negative impacts of over-stationarization. Through the above design, Non-stationary Transformers can maintain data predictability while fully leveraging the key temporal dependencies in the original data, significantly improving forecasting performance. Experiments demonstrate that this method outperforms various existing time series forecasting models on 6 real-world benchmark datasets.