Abstract:Recent advancements in multivariate time series forecasting have been propelled by Linear-based, Transformer-based, and Convolution-based models, with Transformer-based architectures gaining prominence for their efficacy in temporal and cross-channel mixing. More recently, Mamba, a state space model, has emerged with robust sequence and feature mixing capabilities. However, the suitability of the vanilla Mamba design for time series forecasting remains an open question, particularly due to its inadequate handling of cross-channel dependencies. Capturing cross-channel dependencies is critical in enhancing the performance of multivariate time series prediction. Recent findings show that self-attention excels in capturing cross-channel dependencies, whereas other simpler mechanisms, such as MLP, may degrade model performance. This is counterintuitive, as MLP, being a learnable architecture, should theoretically capture both correlations and irrelevances, potentially leading to neutral or improved performance. Diving into the self-attention mechanism, we attribute the observed degradation in MLP performance to its lack of data dependence and global receptive field, which result in MLP's lack of generalization ability. Based on the above insights, we introduce a refined Mamba variant tailored for time series forecasting. Our proposed model, \textbf{CMamba}, incorporates a modified Mamba (M-Mamba) module for temporal dependencies modeling, a global data-dependent MLP (GDD-MLP) to effectively capture cross-channel dependencies, and a Channel Mixup mechanism to mitigate overfitting. Comprehensive experiments conducted on seven real-world datasets demonstrate the efficacy of our model in improving forecasting performance.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in multivariate time - series prediction, existing methods (such as the Mamba model) have deficiencies in handling cross - channel dependencies. Specifically, although the Mamba model performs well in sequence and feature mixing, its ability to handle cross - channel dependencies is weak, which limits its performance in multivariate time - series prediction. The paper points out that capturing cross - channel dependencies is crucial for improving the performance of multivariate time - series prediction.
### Main contributions of the paper
1. **Improved Mamba module (M - Mamba)**:
- A modified version of the Mamba module (M - Mamba) is proposed for better modeling of time dependencies.
- The convolution operation and the feature - specific transfer matrix \(A\) are removed, and the skip - connection matrix \(D\) is made data - dependent.
2. **Globally data - dependent multi - layer perceptron (GDD - MLP)**:
- The GDD - MLP module is introduced to effectively capture cross - channel dependencies through data - dependent weights and biases.
- GDD - MLP endows the original MLP with the advantages of data - dependence and global receptive field.
3. **Channel mixing strategy (Channel Mixup)**:
- A channel mixing strategy is proposed. By linearly combining different channels to create virtual channels, the generalization ability of the model is enhanced and the over - fitting problem is reduced.
### Method overview
- **CMamba framework**:
- The input multivariate time - series is first processed by the channel - mixing module to generate virtual samples.
- Then, these samples are processed by instance normalization and blocking, and input into the CMamba encoder.
- The CMamba encoder consists of multiple CMamba blocks, and each block contains an M - Mamba module and a GDD - MLP module.
- Finally, the prediction results are generated through a linear layer.
### Experimental results
- **Datasets**:
- The paper conducted experiments on seven widely - used datasets: ETTm1, ETTm2, ETTh1, ETTh2, Electricity, Weather, and Traffic.
- **Baseline models**:
- Ten advanced models were selected as baselines, including linear - based models, Transformer - based models, and convolution - based models.
- **Performance comparison**:
- CMamba performs best in most settings, especially when dealing with datasets with a large number of time - series (such as Electricity, Weather, and Traffic), and its performance is comparable to or better than that of iTransformer.
- The experimental results show that the GDD - MLP module can achieve data - dependence and global receptive field comparable to those of the self - attention mechanism with a significantly reduced computational cost.
### Conclusion
By improving the Mamba model and introducing the GDD - MLP and Channel Mixup modules, the paper effectively solves the cross - channel dependency problem in multivariate time - series prediction and significantly improves the prediction performance of the model. The experimental results verify the effectiveness of these improvements and demonstrate the superior performance of CMamba in multiple practical applications.