Abstract:The recent breakthrough of Transformers in deep learning has drawn significant attention of the time series community due to their ability to capture long-range dependencies. However, like other deep learning models, Transformers face limitations in time series prediction, including insufficient temporal understanding, generalization challenges, and data shift issues for the domains with limited data. Additionally, addressing the issue of catastrophic forgetting, where models forget previously learned information when exposed to new data, is another critical aspect that requires attention in enhancing the robustness of Transformers for time series tasks. To address these limitations, in this paper, we pre-train the time series Transformer model on a source domain with sufficient data and fine-tune it on the target domain with limited data. We introduce the \emph{One-step fine-tuning} approach, adding some percentage of source domain data to the target domains, providing the model with diverse time series instances. We then fine-tune the pre-trained model using a gradual unfreezing technique. This helps enhance the model's performance in time series prediction for domains with limited data. Extensive experimental results on two real-world datasets show that our approach improves over the state-of-the-art baselines by 4.35% and 11.54% for indoor temperature and wind power prediction, respectively.

What problem does this paper attempt to address?

The paper primarily addresses several key issues in time series forecasting, especially for target domains with limited data, by improving prediction performance through pre-training and fine-tuning methods. The main problems the paper attempts to solve are: 1. **Data Scarcity**: In some practical application scenarios, the amount of time series data available for training may be very limited. This leads to the model's inability to fully learn the patterns and dependencies in the data during training. 2. **Data Distribution Shift (Data Drift)**: There may be significant differences in data distribution between the source domain and the target domain. Such differences can cause the model to perform well in the source domain but poorly in the target domain. 3. **Insufficient Generalization**: Even if the model performs well on the training data, it may struggle to accurately predict new, unseen data, especially on non-independent and identically distributed (non-i.i.d.) data. 4. **Catastrophic Forgetting**: During continual learning or domain adaptation, the model may forget previously learned knowledge, and the performance on old tasks may gradually decline when new tasks are introduced. To address these challenges, the paper proposes a method called "One-step fine-tuning". Specifically, the method includes the following steps: - **Pre-training**: First, pre-train a Transformer-based time series model on a data-rich source domain to learn a general representation of time series features. - **Adding Source Domain Data**: During the fine-tuning phase, add a certain proportion of source domain data to the target domain data, which helps to mitigate the problems of data scarcity and data drift, and reduces the risk of catastrophic forgetting. - **Gradual Unfreezing (GU)**: Employ GU technique for fine-tuning, which involves freezing some layers of the model initially and then gradually unfreezing these layers for training. This method helps to retain the knowledge of the source domain model while allowing the model to adapt to the target domain. Experimental results show that using this method can achieve significant performance improvements in target domains with limited data, especially in the practical cases of indoor temperature prediction and wind power generation prediction, with improvements of 4.35% and 11.54% respectively compared to baseline methods. In summary, the paper effectively addresses several common and important challenges in time series forecasting by proposing a method that combines pre-training, data augmentation, and specific fine-tuning strategies.

Domain Adaptation for Time series Transformers using One-step fine-tuning

Hidformer: Hierarchical Dual-Tower Transformer Using Multi-Scale Mergence for Long-Term Time Series Forecasting

Dateformer: Time-modeling Transformer for Longer-term Series Forecasting

Domain-Conditioned Transformer for Fully Test-time Adaptation

Itransformer: Inverted Transformers Are Effective for Time Series Forecasting

Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning

Enhancing Time Series Forecasting: A Hierarchical Transformer with Probabilistic Decomposition Representation

Deep Double Descent for Time Series Forecasting: Avoiding Undertrained Models

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Spatial–Temporal Temperature Forecasting Using Deep-Neural-Network-Based Domain Adaptation

Transformers in Time Series: A Survey

Safe Self-Refinement for Transformer-based Domain Adaptation

Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

TADA: Efficient Task-Agnostic Domain Adaptation for Transformers

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

Integrating domain knowledge into transformer for short-term wind power forecasting

Towards Unsupervised Domain Adaptation via Domain-Transformer

Learning cross-domain representations by vision transformer for unsupervised domain adaptation

Self-Adjusting Domain Adversarial Transfer Learning Algorithm for Power Transformer Lifetime Prediction

Time-Transformer: Integrating Local and Global Features for Better Time Series Generation

Are Transformers Effective for Time Series Forecasting?