Abstract:Deep models for Multivariate Time Series (MTS) forecasting have recently demonstrated significant success. Channel-dependent models capture complex dependencies that channel-independent models cannot capture. However, the number of channels in real-world applications outpaces the capabilities of existing channel-dependent models, and contrary to common expectations, some models underperform the channel-independent models in handling high-dimensional data, which raises questions about the performance of channel-dependent models. To address this, our study first investigates the reasons behind the suboptimal performance of these channel-dependent models on high-dimensional MTS data. Our analysis reveals that two primary issues lie in the introduced noise from unrelated series that increases the difficulty of capturing the crucial inter-channel dependencies, and challenges in training strategies due to high-dimensional data. To address these issues, we propose STHD, the Scalable Transformer for High-Dimensional Multivariate Time Series Forecasting. STHD has three components: a) Relation Matrix Sparsity that limits the noise introduced and alleviates the memory issue; b) ReIndex applied as a training strategy to enable a more flexible batch size setting and increase the diversity of training data; and c) Transformer that handles 2-D inputs and captures channel dependencies. These components jointly enable STHD to manage the high-dimensional MTS while maintaining computational feasibility. Furthermore, experimental results show STHD's considerable improvement on three high-dimensional datasets: Crime-Chicago, Wiki-People, and Traffic. The source code and dataset are publicly available <a class="link-external link-https" href="https://github.com/xinzzzhou/ScalableTransformer4HighDimensionMTSF.git" rel="external noopener nofollow">this https URL</a>.

Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers

Sparse Binary Transformers for Multivariate Time Series Modeling

Sparse Transformer with Local and Seasonal Adaptation for Multivariate Time Series Forecasting

Adversarial Sparse Transformer for Time Series Forecasting

A lightweight multi-layer perceptron for efficient multivariate time series forecasting

Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

Are Transformers Effective for Time Series Forecasting?

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

Multivariate Time Series Modeling and Forecasting with Parallelized Convolution and Decomposed Sparse-Transformer

Resformer: Combine quadratic linear transformation with efficient sparse Transformer for long-term series forecasting

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting

Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning

Enhanced Linear and Vision Transformer-Based Architectures for Time Series Forecasting

Transformer Acceleration with Dynamic Sparse Attention

TSLANet: Rethinking Transformers for Time Series Representation Learning

Transformer Multivariate Forecasting: Less is More?