Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting

Nannan Bian,Minhong Zhu,Li Chen,Weiran Cai
2024-05-20
Abstract:Deep learning methods have been exerting their strengths in long-term time series forecasting. However, they often struggle to strike a balance between expressive power and computational efficiency. Resorting to multi-layer perceptrons (MLPs) provides a compromising solution, yet they suffer from two critical problems caused by the intrinsic point-wise mapping mode, in terms of deficient contextual dependencies and inadequate information bottleneck. Here, we propose the Coarsened Perceptron Network (CP-Net), featured by a coarsening strategy that alleviates the above problems associated with the prototype MLPs by forming information granules in place of solitary temporal points. The CP-Net utilizes primarily a two-stage framework for extracting semantic and contextual patterns, which preserves correlations over larger timespans and filters out volatile noises. This is further enhanced by a multi-scale setting, where patterns of diverse granularities are fused towards a comprehensive prediction. Based purely on convolutions of structural simplicity, CP-Net is able to maintain a linear computational complexity and low runtime, while demonstrates an improvement of 4.1% compared with the SOTA method on seven forecasting benchmarks.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two key problems in long - term multivariate time - series prediction: 1. **Insufficient context - dependence**: Traditional multi - layer perceptrons (MLPs) are difficult to capture the rich context - dependent relationships in time - series due to their point - by - point mapping mode. This leads to poor performance of the model when dealing with complex time - series. 2. **Insufficient information bottleneck**: The point - by - point mapping mode of MLPs also makes it difficult for the model to form an effective information bottleneck, and thus cannot effectively filter out redundant noise. This further affects the prediction accuracy of the model. To solve these problems, the authors propose the **Coarsened Perceptron Network (CP - Net)**, which enhances the performance of MLPs by introducing a **coarsening strategy**. Specifically, the main contributions of CP - Net are as follows: - **Two - stage coarsening framework**: CP - Net consists of two main modules - Token Projection Block and Contextual Sampling Block. These two modules are respectively used to extract semantic and context patterns and relieve the limitations of the MLP point - by - point mapping mode by forming information granules. - **Multi - scale fusion**: In order to capture time - patterns of different granularities, CP - Net adopts a multi - scale setting. By combining different token lengths and sampling rates, information of different granularities is fused together, thereby achieving more comprehensive prediction. - **Linear computational complexity**: All components of CP - Net are based on convolution operations, ensuring that the entire architecture has a linear computational complexity and low running time. Through these improvements, the experimental results of CP - Net on seven benchmark datasets show that it improves by 4.1% and 3.3% in mean - squared error (MSE) and mean - absolute error (MAE) respectively compared to existing state - of - the - art methods (such as Transformer - based PatchTST, CNN - based TimesNet, etc.). ### Formula summary 1. **Preliminary prediction formula**: \[ X_{tp}=TP(X_{in}) = MLP(Conv1d(X_{in})) \] where \(X_{in}\in\mathbb{R}^{I\times N}\) is the input sequence, \(TL\) is the length of the coarse - grained token, and \(X_{tp}\in\mathbb{R}^{O\times N}\) is the preliminary prediction. 2. **Context sampling formula**: \[ X_{cs}=DilatedConv1d(Concat(X_{tp}, X_{in})) \] \[ X_m = EquiConv1d(X_{cs}) \] where \(X_{cs}\in\mathbb{R}^{(I + O)\times N}\), \(DilatedConv1d(\cdot)\) is a 1D convolution layer with a dilation rate, \(Concat(\cdot)\) is a dedicated padding strategy, and \(EquiConv1d(\cdot)\) is an equidistant convolution. 3. **Multi - scale fusion formula**: \[ Y = Merge\left(\sum_{(i,j)\in S}Y_{i,j}^m\right) \] where \(S=\{(TL_i, SR_j)\}\) is the parameter set of all branches, \(Y_{i,j}^m\) is the prediction output of each branch, and \(Merge(\cdot)\) is a multi - scale fusion method. Through these formulas and designs, CP - Net effectively solves the limitations of traditional MLPs in long - term time - series prediction and improves the prediction performance.