Abstract:Recent efforts have been dedicated to enhancing time series forecasting accuracy by introducing advanced network architectures and self-supervised pretraining strategies. Nevertheless, existing approaches still exhibit two critical drawbacks. Firstly, these methods often rely on a single dataset for training, limiting the model's generalizability due to the restricted scale of the training data. Secondly, the one-step generation schema is widely followed, which necessitates a customized forecasting head and overlooks the temporal dependencies in the output series, and also leads to increased training costs under different horizon length settings. To address these issues, we propose a novel generative pretrained hierarchical transformer architecture for forecasting, named \textbf{GPHT}. There are two aspects of key designs in GPHT. On the one hand, we advocate for constructing a mixed dataset under the channel-independent assumption for pretraining our model, comprising various datasets from diverse data scenarios. This approach significantly expands the scale of training data, allowing our model to uncover commonalities in time series data and facilitating improved transfer to specific datasets. On the other hand, GPHT employs an auto-regressive forecasting approach, effectively modeling temporal dependencies in the output series. Importantly, no customized forecasting head is required, enabling \textit{a single model to forecast at arbitrary horizon settings.} We conduct sufficient experiments on eight datasets with mainstream self-supervised pretraining models and supervised models. The results demonstrated that GPHT surpasses the baseline models across various fine-tuning and zero/few-shot learning settings in the traditional long-term forecasting task. We make our codes publicly available\footnote{<a class="link-external link-https" href="https://github.com/icantnamemyself/GPHT" rel="external noopener nofollow">this https URL</a>}.

HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Masked Multi-Step Multivariate Time Series Forecasting with Future Information

Long Time Series Deep Forecasting with Multiscale Feature Extraction and Seq2seq Attention Mechanism

Enhancing Time Series Forecasting: A Hierarchical Transformer with Probabilistic Decomposition Representation

Multi-Horizon Time Series Forecasting with Temporal Attention Learning

SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

Multi-task Learning Method for Hierarchical Time Series Forecasting

End-to-End Modeling Hierarchical Time Series Using Autoregressive Transformer and Conditional Normalizing Flow based Reconciliation

Multi-resolution Time-Series Transformer for Long-term Forecasting

Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

End-to-End Modeling of Hierarchical Time Series Using Autoregressive Transformer and Conditional Normalizing Flow-based Reconciliation

Rethinking self-supervised learning for time series forecasting: A temporal perspective

TCLN: A Transformer-based Conv-LSTM Network for Multivariate Time Series Forecasting

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

A Multi-View Multi-Task Learning Framework for Multi-Variate Time Series Forecasting

Ti-MAE: Self-Supervised Masked Time Series Autoencoders

Generative Pretrained Hierarchical Transformer for Time Series Forecasting

MMFNet: Multi-Scale Frequency Masking Neural Network for Multivariate Time Series Forecasting

Adapt to Small-Scale and Long-Term Time Series Forecasting with Enhanced Multidimensional Correlation