A Mamba Foundation Model for Time Series Forecasting

Haoyu Ma,Yushu Chen,Wenlai Zhao,Jinzhe Yang,Yingsheng Ji,Xinghua Xu,Xiaozhu Liu,Hao Jing,Shengzhuo Liu,Guangwen Yang
2024-11-05
Abstract:Time series foundation models have demonstrated strong performance in zero-shot learning, making them well-suited for predicting rapidly evolving patterns in real-world applications where relevant training data are scarce. However, most of these models rely on the Transformer architecture, which incurs quadratic complexity as input length increases. To address this, we introduce TSMamba, a linear-complexity foundation model for time series forecasting built on the Mamba architecture. The model captures temporal dependencies through both forward and backward Mamba encoders, achieving high prediction accuracy. To reduce reliance on large datasets and lower training costs, TSMamba employs a two-stage transfer learning process that leverages pretrained Mamba LLMs, allowing effective time series modeling with a moderate training set. In the first stage, the forward and backward backbones are optimized via patch-wise autoregressive prediction; in the second stage, the model trains a prediction head and refines other components for long-term forecasting. While the backbone assumes channel independence to manage varying channel numbers across datasets, a channel-wise compressed attention module is introduced to capture cross-channel dependencies during fine-tuning on specific multivariate datasets. Experiments show that TSMamba's zero-shot performance is comparable to state-of-the-art time series foundation models, despite using significantly less training data. It also achieves competitive or superior full-shot performance compared to task-specific prediction models. The code will be made publicly available.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the challenge faced by existing time series forecasting models in handling rapidly changing patterns. Specifically: 1. **Data Scarcity**: Traditional supervised learning models require specific datasets for training, but in practical applications, newly emerging patterns may lack relevant data or be difficult to collect. 2. **Lack of Generalization**: These models typically perform well in specific domains or tasks but struggle to generalize across different domains or frequencies, leading to high and time-consuming adaptation costs from one domain to another. 3. **Low Data Efficiency**: When training data is limited, these models are prone to overfitting. To tackle these issues, the paper introduces TSMamba, a time series foundation model based on the Mamba architecture. The main features of TSMamba include: - **Linear Complexity**: By using the Mamba architecture, TSMamba achieves linear complexity, avoiding the quadratic complexity problem of traditional Transformer models. - **Two-Stage Transfer Learning**: Utilizing the large-scale pre-trained Mamba language model, the two-stage transfer learning process enables the model to effectively adapt to time series data while reducing dependence on large-scale datasets and lowering training costs. - **Multivariate Data Handling**: A compressed channel attention module is introduced to capture cross-channel dependencies in multivariate data, enhancing the model's performance on specific datasets. In summary, this paper aims to develop an efficient, highly generalizable, and data-efficient time series forecasting model to address the challenges of dynamic data changes and data scarcity in the real world.