An Analysis of Linear Time Series Forecasting Models

William Toner,Luke Darlow
2024-03-25
Abstract:Despite their simplicity, linear models perform well at time series forecasting, even when pitted against deeper and more expensive models. A number of variations to the linear model have been proposed, often including some form of feature normalisation that improves model generalisation. In this paper we analyse the sets of functions expressible using these linear model architectures. In so doing we show that several popular variants of linear models for time series forecasting are equivalent and functionally indistinguishable from standard, unconstrained linear regression. We characterise the model classes for each linear variant. We demonstrate that each model can be reinterpreted as unconstrained linear regression over a suitably augmented feature set, and therefore admit closed-form solutions when using a mean-squared loss function. We provide experimental evidence that the models under inspection learn nearly identical solutions, and finally demonstrate that the simpler closed form solutions are superior forecasters across 72% of test settings.
Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: Despite the good performance of linear models in time series forecasting, do many popular variants of linear models really have different functionalities or performance advantages? By analyzing the mathematical properties of these linear models, the authors demonstrate that most popular linear time series forecasting models are actually equivalent and can be reduced to a standard unconstrained linear regression model. Specifically, the main objectives of the paper include: 1. **Mathematical Analysis**: Conduct an in-depth mathematical analysis of several commonly used linear time series forecasting models, particularly the sets of functions they express. 2. **Model Equivalence**: Prove that these models are functionally equivalent, meaning the families of parameter functions they describe are the same (with differences in data normalization choices). 3. **Experimental Validation**: Experimentally verify that these models indeed converge to the same optimal solution during training, and that simple closed-form solutions often outperform models trained via gradient descent. ### Background of the Paper - **Importance of Time Series Forecasting**: Accurate time series forecasting is crucial for decision-making and strategic planning in fields such as finance, meteorology, healthcare, cloud computing, and traffic management. - **Application of Deep Learning**: Although deep learning has achieved significant success in fields like computer vision and natural language processing, its application in time series forecasting faces unique challenges, and performance improvements are often limited. - **Advantages of Linear Models**: Linear models perform well in certain applications due to their simplicity, interpretability, and efficiency, especially in industries that require frequent queries or high-resolution data processing. ### Main Contributions 1. **Mathematical Proof**: Prove that several popular linear time series forecasting models are essentially the same, corresponding to unconstrained or weakly constrained (through feature enhancement) linear regression. 2. **Experimental Evidence**: Provide experimental results showing that these models indeed tend to converge to the same optimal solution during training, with only slight differences in bias parameters. 3. **Superiority of Closed-Form Solutions**: Demonstrate that closed-form solutions in least squares linear regression generally outperform models trained via stochastic gradient descent. ### Related Work - **DLinear and NLinear**: Two models proposed by Zeng et al., which have become common baselines in time series forecasting research. - **Reversible Instance Normalization (RevIN)**: A feature normalization technique that often improves time series forecasting. - **RLinear**: A linear mapping model using RevIN, exploring the impact of channel independence (CI). - **FITS**: A linear time series model operating in the frequency domain, including optional high-frequency filtering components, with performance close to or reaching state-of-the-art levels. ### Model Analysis - **DLinear**: Predicts by decomposing trend and seasonal components, ultimately representable as an affine linear function. - **FITS**: Predicts through Fourier transform, complex linear mapping, and inverse Fourier transform, also representable as an affine linear function. - **Normalization Strategies**: Discuss how normalization strategies like Instance Normalization (IN), Reversible Instance Normalization (RevIN), and NowNorm constrain the model class. ### Experimental Results - **Convergence of Weight Matrices**: Experiments show that the weight matrices of all models tend to converge to the closed-form solution during training. - **Prediction Performance**: Closed-form solutions perform well in most test settings, outperforming models trained via gradient descent. - **Differences in Bias Terms**: Despite similar weight matrices, different models exhibit differences in bias terms, particularly with FITS+IN having smaller bias terms. Overall, this paper reveals through mathematical analysis and experimental validation that various popular linear time series forecasting models are equivalent in functionality and performance, providing a theoretical basis for simplifying model selection and improving forecasting efficiency.