Domain Adaptation for Time series Transformers using One-step fine-tuning

Subina Khanal,Seshu Tirupathi,Giulio Zizzo,Ambrish Rawat,Torben Bach Pedersen
2024-01-12
Abstract:The recent breakthrough of Transformers in deep learning has drawn significant attention of the time series community due to their ability to capture long-range dependencies. However, like other deep learning models, Transformers face limitations in time series prediction, including insufficient temporal understanding, generalization challenges, and data shift issues for the domains with limited data. Additionally, addressing the issue of catastrophic forgetting, where models forget previously learned information when exposed to new data, is another critical aspect that requires attention in enhancing the robustness of Transformers for time series tasks. To address these limitations, in this paper, we pre-train the time series Transformer model on a source domain with sufficient data and fine-tune it on the target domain with limited data. We introduce the \emph{One-step fine-tuning} approach, adding some percentage of source domain data to the target domains, providing the model with diverse time series instances. We then fine-tune the pre-trained model using a gradual unfreezing technique. This helps enhance the model's performance in time series prediction for domains with limited data. Extensive experimental results on two real-world datasets show that our approach improves over the state-of-the-art baselines by 4.35% and 11.54% for indoor temperature and wind power prediction, respectively.
Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses several key issues in time series forecasting, especially for target domains with limited data, by improving prediction performance through pre-training and fine-tuning methods. The main problems the paper attempts to solve are: 1. **Data Scarcity**: In some practical application scenarios, the amount of time series data available for training may be very limited. This leads to the model's inability to fully learn the patterns and dependencies in the data during training. 2. **Data Distribution Shift (Data Drift)**: There may be significant differences in data distribution between the source domain and the target domain. Such differences can cause the model to perform well in the source domain but poorly in the target domain. 3. **Insufficient Generalization**: Even if the model performs well on the training data, it may struggle to accurately predict new, unseen data, especially on non-independent and identically distributed (non-i.i.d.) data. 4. **Catastrophic Forgetting**: During continual learning or domain adaptation, the model may forget previously learned knowledge, and the performance on old tasks may gradually decline when new tasks are introduced. To address these challenges, the paper proposes a method called "One-step fine-tuning". Specifically, the method includes the following steps: - **Pre-training**: First, pre-train a Transformer-based time series model on a data-rich source domain to learn a general representation of time series features. - **Adding Source Domain Data**: During the fine-tuning phase, add a certain proportion of source domain data to the target domain data, which helps to mitigate the problems of data scarcity and data drift, and reduces the risk of catastrophic forgetting. - **Gradual Unfreezing (GU)**: Employ GU technique for fine-tuning, which involves freezing some layers of the model initially and then gradually unfreezing these layers for training. This method helps to retain the knowledge of the source domain model while allowing the model to adapt to the target domain. Experimental results show that using this method can achieve significant performance improvements in target domains with limited data, especially in the practical cases of indoor temperature prediction and wind power generation prediction, with improvements of 4.35% and 11.54% respectively compared to baseline methods. In summary, the paper effectively addresses several common and important challenges in time series forecasting by proposing a method that combines pre-training, data augmentation, and specific fine-tuning strategies.