Abstract:Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable the proposed method to assemble its predictions sequentially, emphasizing components with different frequencies and scales while decomposing the input signal and synthesizing the forecast. We prove that the hierarchical interpolation technique can efficiently approximate arbitrarily long horizons in the presence of smoothness. Additionally, we conduct extensive large-scale dataset experiments from the long-horizon forecasting literature, demonstrating the advantages of our method over the state-of-the-art methods, where N-HiTS provides an average accuracy improvement of almost 20% over the latest Transformer architectures while reducing the computation time by an order of magnitude (50 times). Our code is available at <a class="link-external link-http" href="http://bit.ly/3VA5DoT" rel="external noopener nofollow">this http URL</a>

What problem does this paper attempt to address?

The problems that this paper attempts to solve are two major challenges in long - term time - series prediction: prediction volatility and computational complexity. Specifically, as the prediction horizon increases, the prediction error and computational cost of neural network models increase sharply, which makes long - term prediction particularly difficult. The paper proposes the N - HiTS model to address these issues by introducing novel hierarchical interpolation and multi - rate data sampling techniques. ### Specific description of the problems 1. **Prediction volatility**: - In long - term prediction, as the prediction horizon increases, the volatility of prediction results increases, leading to a decline in prediction accuracy. - For example, in electricity consumption prediction, the prediction error of the fully - connected architecture deteriorates significantly as the prediction horizon increases (see Figure 1b). 2. **Computational complexity**: - Existing neural network models (such as the Transformer based on the attention mechanism and fully - connected layers) have rapidly expanding computational costs and memory footprints when dealing with long - term prediction as the prediction horizon increases. - The computational complexity of these models is usually quadratic with the length of the prediction horizon, resulting in a significant increase in training time and memory requirements. ### Overview of the solution To address the above problems, the paper proposes a new model - N - HiTS (Neural Hierarchical Interpolation for Time Series Forecasting), whose main innovations include: 1. **Hierarchical interpolation**: - By reducing the dimension of neural network prediction and using multi - scale hierarchical interpolation to match the time scale of prediction with the time scale of the final output, the smoothness of prediction is ensured. - Expressed by the formula: \[ \hat{y}_{\tau,\ell} = g(\tau, \theta_f^\ell), \quad \forall \tau \in \{t + 1,\dots,t + H\} \] \[ \tilde{y}_{\tau,\ell} = g(\tau, \theta_b^\ell), \quad \forall \tau \in \{t - L,\dots,t\} \] 2. **Multi - rate data sampling**: - Use the MaxPool layer at the input end of each block to help the model focus on analyzing input components with a specific scale. - A larger pooling kernel size \(k_\ell\) will reduce the high - frequency/small - time - scale components, making the model more focused on analyzing large - scale/low - frequency content. - Expressed by the formula: \[ y_{t - L:t,\ell}^{(p)}=\text{MaxPool}(y_{t - L:t,\ell}, k_\ell) \] 3. **Multi - scale prediction synthesis**: - The model learns to generate forward and backward basis coefficients through a multi - layer perceptron (MLP), and then synthesizes prediction results through nonlinear regression. - Each block focuses on signals in different frequency bands, thus forming a hierarchical prediction structure, reducing memory footprint and computational time, and improving the simplicity and accuracy of the architecture. Through these technological innovations, N - HiTS not only significantly improves the accuracy of long - term prediction (an average improvement of about 20%), but also greatly reduces the computational time (50 times faster than the existing Transformer architecture).

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Enhancing Time Series Forecasting: A Hierarchical Transformer with Probabilistic Decomposition Representation

DMIDAS: Deep Mixed Data Sampling Regression for Long Multi-Horizon Time Series Forecasting

Probabilistic Hierarchical Interpolation and Interpretable Configuration for Flood Prediction

HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting

A machine learning approach for forecasting hierarchical time series

DeepHGNN: Study of Graph Neural Network based Forecasting Methods for Hierarchically Related Multivariate Time Series

Temporal Fusion Transformers for interpretable multi-horizon time series forecasting

Learning the Dynamic Correlations and Mitigating Noise by Hierarchical Convolution for Long-term Sequence Forecasting

Generative Pretrained Hierarchical Transformer for Time Series Forecasting

Advancing Financial Forecasting: A Comparative Analysis of Neural Forecasting Models N-HiTS and N-BEATS

Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach

Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems

Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

Forecasting with Deep Learning: Beyond Average of Average of Average Performance

Neural forecasting at scale

SLOTH: Structured Learning and Task-Based Optimization for Time Series Forecasting on Hierarchies

A hybrid framework for multivariate long-sequence time series forecasting

Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting