Autoregressive Moving-average Attention Mechanism for Time Series Forecasting

Jiecheng Lu,Xu Han,Yan Sun,Shihao Yang
2024-10-04
Abstract:We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms, enhancing their ability to capture long-range and local temporal patterns in time series. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that incorporating the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address several key issues in the task of Time Series Forecasting (TSF): 1. **Improving the performance of time series forecasting**: The paper proposes a new Autoregressive Moving-Average (ARMA) attention mechanism, aimed at enhancing the capabilities of existing Autoregressive (AR) attention mechanisms to better capture long-term and short-term patterns in time series. 2. **Exploring the application of decoder-only Transformer in time series forecasting**: Although encoder-only Transformers and structures like Multi-Layer Perceptrons (MLP) perform well in time series forecasting, the potential of decoder-only Transformers in such tasks has not been fully explored. The paper demonstrates that with appropriate tokenization and training methods, a basic AR Transformer can achieve results comparable to existing state-of-the-art baselines. 3. **Reducing computational complexity**: Existing efficient linear attention mechanisms reduce computational complexity but still have shortcomings in modeling local patterns. By introducing the ARMA structure, the paper maintains O(N) time complexity while significantly improving forecasting performance. 4. **Handling long-term and short-term dependencies**: Traditional Exponential Moving Average (EMA) methods perform well in smoothing local data but struggle to capture long-term information. The ARMA structure effectively handles and decouples long-term and short-term effects by combining the cumulative impact of historical data and prediction errors, thereby significantly improving forecasting performance. Specifically, the main contributions of the paper include: - Demonstrating that with appropriate tokenization and preprocessing methods, AR Transformers can reach the level of existing state-of-the-art baselines, and by introducing the ARMA attention mechanism, decoder-only Transformers can surpass these baselines. - Proposing the ARMA attention mechanism, which introduces Moving Average (MA) terms into the existing AR attention mechanism without increasing time complexity or the number of parameters. By adding MA terms, ARMA Transformers significantly outperform their AR counterparts in forecasting performance. - Designing an indirect MA weight generation method that is computationally efficient, ensuring that implicit MA weights can effectively capture important short-term effects in time series forecasting, allowing AR terms to focus more on long-term and cyclical patterns. Through extensive experiments and visual analysis, the paper validates the effectiveness of ARMA in balancing long-term and short-term dependencies, significantly enhancing the time series forecasting performance of AR Transformers.