SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series Forecasting

Xiongxiao Xu,Canyu Chen,Yueqing Liang,Baixiang Huang,Guangji Bai,Liang Zhao,Kai Shu
2024-08-23
Abstract:Despite significant progress in time series forecasting, existing forecasters often overlook the heterogeneity between long-range and short-range time series, leading to performance degradation in practical applications. In this work, we highlight the need of distinct objectives tailored to different ranges. We point out that time series can be decomposed into global patterns and local variations, which should be addressed separately in long- and short-range time series. To meet the objectives, we propose a multi-scale hybrid Mamba-Transformer experts model State Space Transformer (SST). SST leverages Mamba as an expert to extract global patterns in coarse-grained long-range time series, and Local Window Transformer (LWT), the other expert to focus on capturing local variations in fine-grained short-range time series. With an input-dependent mechanism, State Space Model (SSM)-based Mamba is able to selectively retain long-term patterns and filter out fluctuations, while LWT employs a local window to enhance locality-awareness capability, thus effectively capturing local variations. To adaptively integrate the global patterns and local variations, a long-short router dynamically adjusts contributions of the two experts. SST achieves superior performance with scaling linearly $O(L)$ on time series length $L$. The comprehensive experiments demonstrate the SST can achieve SOTA results in long-short range time series forecasting while maintaining low memory footprint and computational cost. The code of SST is available at <a class="link-external link-https" href="https://github.com/XiongxiaoXu/SST" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily addresses a problem in time series forecasting: existing time series forecasting methods often overlook the heterogeneity between long-term and short-term time series, which leads to a decline in predictive performance in practical applications. Specifically, the paper points out that for long-term time series, the key is to capture global patterns (such as recurring upward and downward trends), while for short-term time series, attention needs to be paid to local changes (such as extreme values or sudden fluctuations). Therefore, the paper proposes a multi-scale hybrid Mamba-Transformer expert model (SST) aimed at effectively handling data characteristics at different time scales. To achieve this goal, SST first processes the input time series using a multi-resolution framework, distinguishing between long-term and short-term data by adjusting the resolution of the time series. For long-term time series, larger step sizes and block lengths are used to obtain low-resolution time series, thereby better identifying global patterns; for short-term time series, smaller step sizes and block lengths are used to obtain high-resolution time series, which helps capture local changes. Next, SST introduces a hybrid Mamba-Transformer architecture, where Mamba acts as the global pattern expert responsible for extracting long-term patterns in long-term time series, while the Local Window Transformer (LWT) as the local change expert focuses on capturing local details in short-term time series. To integrate the outputs of these two experts, the paper designs a long-short router that can dynamically adjust the contribution ratio of the two experts. The SST model has linear complexity O(L), where L represents the length of the time series. This means that even when processing very long time series, the model can maintain low computational cost and memory consumption. In the experimental section, the authors validated the model on seven widely used real-world datasets, including ETTh1, ETTh2, ETTm, etc. The results show that SST achieved state-of-the-art performance in long-term and short-term time series forecasting tasks while maintaining low memory usage and computational cost.