Diffusion Auto-regressive Transformer for Effective Self-supervised Time Series Forecasting

Daoyu Wang,Mingyue Cheng,Zhiding Liu,Qi Liu,Enhong Chen
2024-10-08
Abstract:Self-supervised learning has become a popular and effective approach for enhancing time series forecasting, enabling models to learn universal representations from unlabeled data. However, effectively capturing both the global sequence dependence and local detail features within time series data remains challenging. To address this, we propose a novel generative self-supervised method called TimeDART, denoting Diffusion Auto-regressive Transformer for Time series forecasting. In TimeDART, we treat time series patches as basic modeling units. Specifically, we employ an self-attention based Transformer encoder to model the dependencies of inter-patches. Additionally, we introduce diffusion and denoising mechanisms to capture the detail locality features of intra-patch. Notably, we design a cross-attention-based denoising decoder that allows for adjustable optimization difficulty in the self-supervised task, facilitating more effective self-supervised pre-training. Furthermore, the entire model is optimized in an auto-regressive manner to obtain transferable representations. Extensive experiments demonstrate that TimeDART achieves state-of-the-art fine-tuning performance compared to the most advanced competitive methods in forecasting tasks. Our code is publicly available at <a class="link-external link-https" href="https://github.com/Melmaphother/TimeDART" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two main problems in time - series prediction: 1. **The gap between pre - training and fine - tuning tasks**: Existing self - supervised learning methods have a large gap between the pre - training stage and downstream tasks, which makes it difficult for the representations learned during pre - training to be effectively transferred during fine - tuning. For example, the data distribution changes introduced by the masking method during pre - training make the tasks between pre - training and fine - tuning inconsistent; contrastive learning methods face challenges when constructing positive and negative sample pairs, especially when dealing with the temporal dependence and similarity definition of time - series data. 2. **Simultaneously capturing global dependencies and local features**: Existing self - supervised methods often have difficulty in effectively capturing both long - range dependencies and local detailed features in time - series data simultaneously. This limits the model's ability to learn comprehensive and highly expressive time - series representations. To solve these problems, the author proposes a new method named TimeDART. TimeDART solves the above problems in the following ways: - **Generative self - supervised framework**: TimeDART adopts a generative self - supervised framework that combines autoregressive generation and denoising diffusion models. The autoregressive generation model can capture the global dependencies of time - series, while the denoising diffusion model focuses on modeling the detailed features of local regions. By combining these two methods, TimeDART can learn global and local features simultaneously during the self - supervised learning process. - **Self - attention mechanism**: TimeDART uses a Transformer encoder based on the self - attention mechanism to model the dependencies between time - series blocks. This helps to capture long - range dependencies in time - series. - **Denoising diffusion module**: In the denoising diffusion module, the author designs a denoising network based on cross - attention, which allows adjusting the optimization difficulty in self - supervised tasks. This design significantly enhances the model's ability to capture local features, thereby improving the effectiveness of pre - training. - **Autoregressive optimization**: The entire model is optimized in an autoregressive manner to obtain transferable representations. This method is more consistent with the paradigm of time - series prediction tasks and reduces the gap between pre - training and fine - tuning. Through these innovations, TimeDART has achieved state - of - the - art performance on multiple time - series prediction tasks, demonstrating its strong ability to capture complex time - series patterns.