Abstract:The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism and is able to capture complex semantic relationships between a variety of patterns present in the input data. Precisely because of these characteristics, the Transformer has recently been exploited for time series forecasting problems, assuming a natural adaptability to the domain of continuous numerical series. Despite the acclaimed results in the literature, some works have raised doubts about the robustness and effectiveness of this approach. In this paper, we further investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting, demonstrate their limitations, and propose a set of alternative models that are better performing and significantly less complex. In particular, we empirically show how simplifying Transformer-based forecasting models almost always leads to an improvement, reaching state of the art performance. We also propose shallow models without the attention mechanism, which compete with the overall state of the art in long time series forecasting, and demonstrate their ability to accurately predict time series over extremely long windows. From a methodological perspective, we show how it is always necessary to use a simple baseline to verify the effectiveness of proposed models, and finally, we conclude the paper with a reflection on recent research paths and the opportunity to follow trends and hypes even where it may not be necessary.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper attempts to address the effectiveness and limitations of using Transformer-based models in time series forecasting. Specifically: 1. **Evaluating the Effectiveness of Existing Models**: Although Transformer models have achieved great success in fields such as natural language processing and computer vision, and have been widely applied to time series forecasting, some studies have questioned their robustness and effectiveness in this area. This paper aims to further explore the performance of these models in time series forecasting and reveal their limitations. 2. **Proposing Alternative Models**: The authors propose a series of simpler and better-performing alternative models that not only outperform existing Transformer-based models in terms of performance but also significantly reduce complexity. In particular, the authors demonstrate that by simplifying the Transformer model, the simplified models almost always lead to performance improvements, reaching or even surpassing the current state-of-the-art levels. 3. **Emphasizing the Importance of Baseline Models**: The paper emphasizes the importance of using simple baseline models (such as the Persistence model) when evaluating new models to ensure the actual effectiveness of the new models. The authors point out that even some very simple models can sometimes rival or even outperform complex Transformer models. 4. **Exploring the Potential of Shallow Models**: The authors introduce two shallow models—the novel Sine Layer Perceptron (SLP) and the traditional Multi-Layer Perceptron (MLP)—and demonstrate in experiments the stability and robustness of these models in long-window time series forecasting. ### Summary The main purpose of the paper is to re-evaluate the effectiveness of Transformer-based time series forecasting models, reveal their limitations, and propose a series of simpler and better-performing alternative models. Through experiments, it is demonstrated that these alternative models not only outperform existing Transformer models in terms of performance but also exhibit higher stability and robustness when dealing with extremely long prediction windows. Additionally, the paper emphasizes the importance of using simple baseline models when evaluating new models to ensure the accuracy of the evaluation.

Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Enhancing Time Series Forecasting: A Hierarchical Transformer with Probabilistic Decomposition Representation

Itransformer: Inverted Transformers Are Effective for Time Series Forecasting

Deep Double Descent for Time Series Forecasting: Avoiding Undertrained Models

Time Series Forecasting (TSF) Using Various Deep Learning Models

A Systematic Review for Transformer-based Long-term Series Forecasting

Deep Time Series Forecasting Models: A Comprehensive Survey

Dateformer: Time-modeling Transformer for Longer-term Series Forecasting

Are Transformers Effective for Time Series Forecasting?

Enhanced Linear and Vision Transformer-Based Architectures for Time Series Forecasting

InParformer: Evolutionary Decomposition Transformers with Interactive Parallel Attention for Long-Term Time Series Forecasting

Inter-Series Transformer: Attending to Products in Time Series Forecasting

Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case

Dateformer: Transformer Extends Look-back Horizon to Predict Longer-term Time Series

A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

TreeDRNet:A Robust Deep Model for Long Term Time Series Forecasting

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers