Itransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu,Tengge Hu,Haoran Zhang,Haixu Wu,Shiyu Wang,Lintao Ma,Mingsheng Long
DOI: https://doi.org/10.48550/arxiv.2310.06625
2023-01-01
Abstract:The recent boom of linear forecasting models questions the ongoing passionfor architectural modifications of Transformer-based forecasters. Theseforecasters leverage Transformers to model the global dependencies overtemporal tokens of time series, with each token formed by multiple variates ofthe same timestamp. However, Transformers are challenged in forecasting serieswith larger lookback windows due to performance degradation and computationexplosion. Besides, the embedding for each temporal token fuses multiplevariates that represent potential delayed events and distinct physicalmeasurements, which may fail in learning variate-centric representations andresult in meaningless attention maps. In this work, we reflect on the competentduties of Transformer components and repurpose the Transformer architecturewithout any modification to the basic components. We propose iTransformer thatsimply applies the attention and feed-forward network on the inverteddimensions. Specifically, the time points of individual series are embeddedinto variate tokens which are utilized by the attention mechanism to capturemultivariate correlations; meanwhile, the feed-forward network is applied foreach variate token to learn nonlinear representations. The iTransformer modelachieves state-of-the-art on challenging real-world datasets, which furtherempowers the Transformer family with promoted performance, generalizationability across different variates, and better utilization of arbitrary lookbackwindows, making it a nice alternative as the fundamental backbone of timeseries forecasting. Code is available at this repository:https://github.com/thuml/iTransformer.
What problem does this paper attempt to address?