Abstract:Transformer-based and MLP-based methods have emerged as leading approaches in time series forecasting (TSF). While Transformer-based methods excel in capturing long-range dependencies, they suffer from high computational complexities and tend to overfit. Conversely, MLP-based methods offer computational efficiency and adeptness in modeling temporal dynamics, but they struggle with capturing complex temporal patterns effectively. To address these challenges, we propose a novel MLP-based Adaptive Multi-Scale Decomposition (AMD) framework for TSF. Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block to dissect and aggregate these patterns in a residual manner. Complemented by the Dual Dependency Interaction (DDI) block and the Adaptive Multi-predictor Synthesis (AMS) block, our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration. Comprehensive experiments demonstrate that our AMD framework not only overcomes the limitations of existing methods but also consistently achieves state-of-the-art performance in both long-term and short-term forecasting tasks across various datasets, showcasing superior efficiency. Code is available at \url{<a class="link-external link-https" href="https://github.com/TROUBADOUR000/AMD" rel="external noopener nofollow">this https URL</a>}

What problem does this paper attempt to address?

This paper attempts to address the limitations of existing methods in time - series forecasting (TSF). Specifically: 1. **Problems with Transformer - based methods**: - **High computational complexity**: Due to the self - attention mechanism, the computational complexity of the Transformer model grows quadratically with the sequence length. - **Over - fitting problem**: When dealing with long sequences, the self - attention mechanism may weaken the temporal relationships, leading to over - emphasis on abrupt points and thus causing over - fitting. 2. **Problems with MLP - based methods**: - **Difficulty in capturing complex temporal patterns**: Although MLP - based methods perform well in terms of computational efficiency and modeling temporal dynamics, due to the simplicity of linear mapping, they have difficulty in effectively capturing complex spatio - temporal patterns, resulting in an information bottleneck and limiting the prediction accuracy. To address these problems, the authors propose an MLP - based Adaptive Multi - scale Decomposition (AMD) framework. This framework addresses the above problems in the following ways: - **Multi - scale decomposition**: Decompose the time series into multiple time patterns at different scales, and use the Multi - scale Decomposable Mixture (MDM) block to analyze and aggregate these patterns. - **Dual - dependency interaction**: Model the temporal and channel - dependency relationships simultaneously through the Dual - dependency Interaction (DDI) block. - **Adaptive multi - predictor synthesis**: Use the Adaptive Multi - predictor Synthesis (AMS) block to adaptively generate weights according to different time patterns and combine these patterns for prediction. Through these improvements, the AMD framework not only overcomes the limitations of existing methods but also achieves state - of - the - art performance in long - term and short - term prediction tasks on multiple datasets, demonstrating higher efficiency and accuracy. ### Formula summary 1. **Linear model prediction formula**: \[ \hat{Y} = XA \oplus b \in R^{C \times L} \] where \( \oplus \) represents the addition of column vectors. 2. **Multi - scale information transformation formula**: \[ g_i(x) = f_i(x) + g_{i + 1}(x)W_i \] where \( W_i \in R^{\left\lfloor \frac{L}{d^{i + 1}} \right\rfloor \times \left\lfloor \frac{L}{d^i} \right\rfloor} \) 3. **Selector weight calculation formula**: \[ S=\text{Softmax}(\text{TopK}(\text{Softmax}(Q(u)), k)) \] \[ Q(u)=\text{Decompose}(u)+\psi\cdot\text{Softplus}(\text{Decompose}(u)\cdot W_{\text{noise}}) \] where \( k \) is the number of main time patterns, \( \psi \sim N(0, 1) \), \( W_{\text{noise}} \in R^{m \times m} \) 4. **Loss function**: \[ L = L_{\text{pred}}+\lambda_1 L_{\text{selector}}+\lambda_2\|\Theta\|^2 \] where \( L_{\text{pred}}=\sum_{i = 0}^T\|y_i-\hat{y}_i\|^2_2 \), \( L_{\text{selector}}=\frac{\text{Var}(S)}{\text{Mean}(S)^2+\epsilon} \), and \( \epsilon \) is a small positive number to prevent numerical instability. Through these technical means.

Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

Foreformer: an Enhanced Transformer-Based Framework for Multivariate Time Series Forecasting

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Enhancing Time Series Forecasting: A Hierarchical Transformer with Probabilistic Decomposition Representation

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

Multi-scale convolution enhanced transformer for multivariate long-term time series forecasting

Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting

NTDformer: A Multi-Scale Forecasting Model for Non-Stationary Multilevel Time Series

Multivariate Time Series Modeling and Forecasting with Parallelized Convolution and Decomposed Sparse-Transformer

Time Series Forecasting with Multi-scale Decomposition and Fourier Neural Operators

Multi-resolution Time-Series Transformer for Long-term Forecasting

TDG4MSF: A temporal decomposition enhanced graph neural network for multivariate time series forecasting

A hybrid framework for multivariate long-sequence time series forecasting

Multi-scale Transformer Pyramid Networks for Multivariate Time Series Forecasting

Sparse Transformer with Local and Seasonal Adaptation for Multivariate Time Series Forecasting

Multi-level deep domain adaptive adversarial model based on tensor-train decomposition for industrial time series forecasting

A Multiscale Interactive Recurrent Network for Time-Series Forecasting