DAM: Towards A Foundation Model for Time Series Forecasting

Luke Darlow,Qiwen Deng,Ahmed Hassan,Martin Asenov,Rajkarn Singh,Artjom Joosen,Adam Barker,Amos Storkey
2024-07-25
Abstract:It is challenging to scale time series forecasting models such that they forecast accurately for multiple distinct domains and datasets, all with potentially different underlying collection procedures (e.g., sample resolution), patterns (e.g., periodicity), and prediction requirements (e.g., reconstruction vs. forecasting). We call this general task universal forecasting. Existing methods usually assume that input data is regularly sampled, and they forecast to pre-determined horizons, resulting in failure to generalise outside of the scope of their training. We propose the DAM - a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time for forecasting to non-fixed horizons. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution, that enables an efficient global perspective of the underlying temporal dynamics while retaining focus on the recent history; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output, (3) the basis coefficients of a continuous function of time. We show that a single univariate DAM, trained on 25 time series datasets, either outperformed or closely matched existing SoTA models at multivariate long-term forecasting across 18 datasets, including 8 held-out for zero-shot transfer, even though these models were trained to specialise for each dataset-horizon combination. This single DAM excels at zero-shot transfer and very-long-term forecasting, performs well at imputation, is interpretable via basis function composition and attention, can be tuned for different inference-cost requirements, is robust to missing and irregularly sampled data {by design}.
Machine Learning
What problem does this paper attempt to address?
The paper aims to address a key challenge in time series forecasting: how to construct a time series forecasting model that can be widely applicable across various domains and datasets. Specifically, the researchers propose a new model called DAM (Deep Data-Dependent Approximate Analytical Model) to achieve so-called "universal forecasting." Traditional time series forecasting methods typically assume that the input data is of fixed length and sampled at regular intervals, and that the forecasting horizon is preset, which limits the flexibility and generalization ability of these methods in practical applications. The DAM model is designed to overcome the limitations of existing methods. It can handle irregular sampling and variable-length data, and it can make predictions for non-fixed forecasting horizons. To achieve this goal, DAM employs three key techniques: 1. **Flexible historical data sampling mechanism**: By using a long-tail distribution to randomly sample historical data, this method considers the importance of long-term historical information while retaining the focus on recent data. 2. **Transformer-based backbone structure**: Utilizing the transformer architecture to process irregularly sampled time series data, thereby generating the basis function coefficients for forecasting. 3. **Adjustable basis function combination**: The output serves as the basis function coefficients of a time-continuous function, enabling the model to adapt to different forecasting needs, including very long-term forecasts. Experimental results show that a single trained DAM model not only outperforms or matches the long-term forecasting performance of existing best models across multiple datasets but also demonstrates excellent capabilities in zero-shot transfer tasks, even surpassing baseline models specifically trained for particular datasets. Additionally, DAM exhibits good performance in very long-term forecasting tasks. In summary, the goal of this paper is to develop a time series forecasting model that can handle various temporal and spatial scales and is applicable to multiple application scenarios, thereby improving forecasting accuracy and generalization ability.