A Systematic Evaluation of Generated Time Series and Their Effects in Self-Supervised Pretraining

Audrey Der,Chin-Chia Michael Yeh,Xin Dai,Huiyuan Chen,Yan Zheng,Yujie Fan,Zhongfang Zhuang,Vivian Lai,Junpeng Wang,Liang Wang,Wei Zhang,Eamonn Keogh
2024-08-15
Abstract:Self-supervised Pretrained Models (PTMs) have demonstrated remarkable performance in computer vision and natural language processing tasks. These successes have prompted researchers to design PTMs for time series data. In our experiments, most self-supervised time series PTMs were surpassed by simple supervised models. We hypothesize this undesired phenomenon may be caused by data scarcity. In response, we test six time series generation methods, use the generated data in pretraining in lieu of the real data, and examine the effects on classification performance. Our results indicate that replacing a real-data pretraining set with a greater volume of only generated samples produces noticeable improvement.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: **Self - supervised pre - training models (PTMs) perform poorly in time - series classification tasks, especially in the case of scarce data**. Specifically, the author observes that most self - supervised time - series pre - training models perform worse than simple supervised models. For this reason, they assume that this phenomenon may be caused by data scarcity and propose a solution, that is, pre - training with generated time - series data to replace or supplement real data, thereby improving the performance of the model. ### Detailed Problem Description 1. **Background and Motivation**: - Self - supervised pre - training models (PTMs) have achieved remarkable success in computer vision and natural language processing tasks. - These successes have prompted researchers to design PTMs applicable to time - series data. - However, in experiments, most self - supervised time - series PTMs perform worse than simple supervised models. 2. **Hypothesis**: - The author hypothesizes that this unsatisfactory phenomenon may be caused by data scarcity. 3. **Solution**: - To verify this hypothesis, the author tests six time - series generation methods and uses the generated data for pre - training instead of using real data. - In this way, they hope to evaluate the impact of the generated data on time - series classification performance. 4. **Research Objectives**: - Explore whether pre - training with generated time - series data can improve the performance of time - series classification tasks. - Compare the combined effects of different generation methods and pre - training methods. ### Method Overview - **Generation Methods**: including Random Walk, Sinusoidal Wave, Multivariate Gaussian, Generative Adversarial Network (GAN), β - Variational Auto - Encoder (β - VAE) and Diffusion Model. - **Pre - training Methods**: including TimeCLR, TS2Vec, MixingUp and TF - C. - **Network Architectures**: including ResNet and Transformer. ### Main Findings - Pre - training with generated time - series data can significantly improve the performance of the model, especially in the case of scarce data. - Advanced generation models (such as GAN, β - VAE and Diffusion Model) perform better than simple generation models (such as Random Walk, Sinusoidal Wave and Multivariate Gaussian), but the difference is not significant. - The ResNet architecture performs better than the Transformer architecture in time - series classification tasks. ### Conclusion This paper systematically evaluates the generated time - series data and its application in self - supervised pre - training, and proves that the generated data can alleviate the data scarcity problem to a certain extent and improve the performance of time - series classification tasks.