SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

Romain Ilbert,Ambroise Odonnat,Vasilii Feofanov,Aladin Virmaux,Giuseppe Paolo,Themis Palpanas,Ievgen Redko

2024-06-03

Abstract:Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses current state-of-the-art methods and is on par with the biggest foundation model MOIRAI while having significantly fewer parameters. The code is available at <a class="link-external link-https" href="https://github.com/romilbert/samformer" rel="external noopener nofollow">this https URL</a>.

Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issue of poor performance of the Transformer architecture in long-term forecasting of multivariate time series. Specifically, although the Transformer has achieved groundbreaking results in fields such as natural language processing and computer vision, its performance in multivariate long-term forecasting tasks is inferior to simple linear baseline methods. To understand this phenomenon, the authors first studied why the Transformer fails to converge to the true solution through a simple linear prediction problem and found that the attention mechanism is the main reason for the low generalization ability. Based on these findings, the paper proposes a lightweight Transformer model called SAMformer, which combines Sharpness-Aware Minimization (SAM) to optimize model parameters and avoid poor local minima. Additionally, SAMformer employs channel attention mechanisms and Reversible Instance Normalization (RevIN) to enhance the model's expressiveness and stability. Experiments demonstrate that SAMformer outperforms current state-of-the-art methods on several commonly used real-world multivariate time series datasets, with a significant reduction in the number of parameters. In short, the main goal of this study is to explore and address the issues of training stability and generalization ability of the Transformer in multivariate time series forecasting, achieving high-performance predictions in practical applications through the proposed SAMformer.

SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

Foreformer: an Enhanced Transformer-Based Framework for Multivariate Time Series Forecasting

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Itransformer: Inverted Transformers Are Effective for Time Series Forecasting

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting

Robformer: A robust decomposition transformer for long-term time series forecasting

ETSformer: Exponential Smoothing Transformers for Time-series Forecasting

Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

InParformer: Evolutionary Decomposition Transformers with Interactive Parallel Attention for Long-Term Time Series Forecasting

Non-stationary Transformers: Rethinking the Stationarity in Time Series Forecasting

RSMformer: an efficient multiscale transformer-based framework for long sequence time-series forecasting

Sparse transformer with local and seasonal adaptation for multivariate time series forecasting

TS-Fastformer: Fast Transformer for Time-Series Forecasting

Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting

sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting

Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

Multi-scale convolution enhanced transformer for multivariate long-term time series forecasting