Cristian Challu,Kin G. Olivares,Boris N. Oreshkin,Federico Garza,Max Mergenthaler-Canseco,Artur Dubrawski
Abstract:Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable the proposed method to assemble its predictions sequentially, emphasizing components with different frequencies and scales while decomposing the input signal and synthesizing the forecast. We prove that the hierarchical interpolation technique can efficiently approximate arbitrarily long horizons in the presence of smoothness. Additionally, we conduct extensive large-scale dataset experiments from the long-horizon forecasting literature, demonstrating the advantages of our method over the state-of-the-art methods, where N-HiTS provides an average accuracy improvement of almost 20% over the latest Transformer architectures while reducing the computation time by an order of magnitude (50 times). Our code is available at <a class="link-external link-http" href="http://bit.ly/3VA5DoT" rel="external noopener nofollow">this http URL</a>
What problem does this paper attempt to address?
The problems that this paper attempts to solve are two major challenges in long - term time - series prediction: prediction volatility and computational complexity. Specifically, as the prediction horizon increases, the prediction error and computational cost of neural network models increase sharply, which makes long - term prediction particularly difficult. The paper proposes the N - HiTS model to address these issues by introducing novel hierarchical interpolation and multi - rate data sampling techniques.
### Specific description of the problems
1. **Prediction volatility**:
- In long - term prediction, as the prediction horizon increases, the volatility of prediction results increases, leading to a decline in prediction accuracy.
- For example, in electricity consumption prediction, the prediction error of the fully - connected architecture deteriorates significantly as the prediction horizon increases (see Figure 1b).
2. **Computational complexity**:
- Existing neural network models (such as the Transformer based on the attention mechanism and fully - connected layers) have rapidly expanding computational costs and memory footprints when dealing with long - term prediction as the prediction horizon increases.
- The computational complexity of these models is usually quadratic with the length of the prediction horizon, resulting in a significant increase in training time and memory requirements.
### Overview of the solution
To address the above problems, the paper proposes a new model - N - HiTS (Neural Hierarchical Interpolation for Time Series Forecasting), whose main innovations include:
1. **Hierarchical interpolation**:
- By reducing the dimension of neural network prediction and using multi - scale hierarchical interpolation to match the time scale of prediction with the time scale of the final output, the smoothness of prediction is ensured.
- Expressed by the formula:
\[
\hat{y}_{\tau,\ell} = g(\tau, \theta_f^\ell), \quad \forall \tau \in \{t + 1,\dots,t + H\}
\]
\[
\tilde{y}_{\tau,\ell} = g(\tau, \theta_b^\ell), \quad \forall \tau \in \{t - L,\dots,t\}
\]
2. **Multi - rate data sampling**:
- Use the MaxPool layer at the input end of each block to help the model focus on analyzing input components with a specific scale.
- A larger pooling kernel size \(k_\ell\) will reduce the high - frequency/small - time - scale components, making the model more focused on analyzing large - scale/low - frequency content.
- Expressed by the formula:
\[
y_{t - L:t,\ell}^{(p)}=\text{MaxPool}(y_{t - L:t,\ell}, k_\ell)
\]
3. **Multi - scale prediction synthesis**:
- The model learns to generate forward and backward basis coefficients through a multi - layer perceptron (MLP), and then synthesizes prediction results through nonlinear regression.
- Each block focuses on signals in different frequency bands, thus forming a hierarchical prediction structure, reducing memory footprint and computational time, and improving the simplicity and accuracy of the architecture.
Through these technological innovations, N - HiTS not only significantly improves the accuracy of long - term prediction (an average improvement of about 20%), but also greatly reduces the computational time (50 times faster than the existing Transformer architecture).