Abstract:Despite significant progress in time series forecasting, existing forecasters often overlook the heterogeneity between long-range and short-range time series, leading to performance degradation in practical applications. In this work, we highlight the need of distinct objectives tailored to different ranges. We point out that time series can be decomposed into global patterns and local variations, which should be addressed separately in long- and short-range time series. To meet the objectives, we propose a multi-scale hybrid Mamba-Transformer experts model State Space Transformer (SST). SST leverages Mamba as an expert to extract global patterns in coarse-grained long-range time series, and Local Window Transformer (LWT), the other expert to focus on capturing local variations in fine-grained short-range time series. With an input-dependent mechanism, State Space Model (SSM)-based Mamba is able to selectively retain long-term patterns and filter out fluctuations, while LWT employs a local window to enhance locality-awareness capability, thus effectively capturing local variations. To adaptively integrate the global patterns and local variations, a long-short router dynamically adjusts contributions of the two experts. SST achieves superior performance with scaling linearly $O(L)$ on time series length $L$. The comprehensive experiments demonstrate the SST can achieve SOTA results in long-short range time series forecasting while maintaining low memory footprint and computational cost. The code of SST is available at <a class="link-external link-https" href="https://github.com/XiongxiaoXu/SST" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily addresses a problem in time series forecasting: existing time series forecasting methods often overlook the heterogeneity between long-term and short-term time series, which leads to a decline in predictive performance in practical applications. Specifically, the paper points out that for long-term time series, the key is to capture global patterns (such as recurring upward and downward trends), while for short-term time series, attention needs to be paid to local changes (such as extreme values or sudden fluctuations). Therefore, the paper proposes a multi-scale hybrid Mamba-Transformer expert model (SST) aimed at effectively handling data characteristics at different time scales. To achieve this goal, SST first processes the input time series using a multi-resolution framework, distinguishing between long-term and short-term data by adjusting the resolution of the time series. For long-term time series, larger step sizes and block lengths are used to obtain low-resolution time series, thereby better identifying global patterns; for short-term time series, smaller step sizes and block lengths are used to obtain high-resolution time series, which helps capture local changes. Next, SST introduces a hybrid Mamba-Transformer architecture, where Mamba acts as the global pattern expert responsible for extracting long-term patterns in long-term time series, while the Local Window Transformer (LWT) as the local change expert focuses on capturing local details in short-term time series. To integrate the outputs of these two experts, the paper designs a long-short router that can dynamically adjust the contribution ratio of the two experts. The SST model has linear complexity O(L), where L represents the length of the time series. This means that even when processing very long time series, the model can maintain low computational cost and memory consumption. In the experimental section, the authors validated the model on seven widely used real-world datasets, including ETTh1, ETTh2, ETTm, etc. The results show that SST achieved state-of-the-art performance in long-term and short-term time series forecasting tasks while maintaining low memory usage and computational cost.

SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series Forecasting

Foreformer: an Enhanced Transformer-Based Framework for Multivariate Time Series Forecasting

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting

UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba

Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics

Bi-Mamba4TS: Bidirectional Mamba for Time Series Forecasting

Generalizable Memory-driven Transformer for Multivariate Long Sequence Time-series Forecasting

Is Mamba Effective for Time Series Forecasting?

Effective LSTMs with Seasonal-Trend Decomposition and Adaptive Learning and Niching-Based Backtracking Search Algorithm for Time Series Forecasting

RSMformer: an efficient multiscale transformer-based framework for long sequence time-series forecasting

Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting

Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

Multi-scale convolution enhanced transformer for multivariate long-term time series forecasting

Multi-resolution Time-Series Transformer for Long-term Forecasting

TCLN: A Transformer-based Conv-LSTM Network for Multivariate Time Series Forecasting

A Mamba Foundation Model for Time Series Forecasting

Enhancing Time Series Forecasting: A Hierarchical Transformer with Probabilistic Decomposition Representation

Long Time Series Deep Forecasting with Multiscale Feature Extraction and Seq2seq Attention Mechanism

sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers