Abstract:Self-supervised Pretrained Models (PTMs) have demonstrated remarkable performance in computer vision and natural language processing tasks. These successes have prompted researchers to design PTMs for time series data. In our experiments, most self-supervised time series PTMs were surpassed by simple supervised models. We hypothesize this undesired phenomenon may be caused by data scarcity. In response, we test six time series generation methods, use the generated data in pretraining in lieu of the real data, and examine the effects on classification performance. Our results indicate that replacing a real-data pretraining set with a greater volume of only generated samples produces noticeable improvement.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: **Self - supervised pre - training models (PTMs) perform poorly in time - series classification tasks, especially in the case of scarce data**. Specifically, the author observes that most self - supervised time - series pre - training models perform worse than simple supervised models. For this reason, they assume that this phenomenon may be caused by data scarcity and propose a solution, that is, pre - training with generated time - series data to replace or supplement real data, thereby improving the performance of the model. ### Detailed Problem Description 1. **Background and Motivation**: - Self - supervised pre - training models (PTMs) have achieved remarkable success in computer vision and natural language processing tasks. - These successes have prompted researchers to design PTMs applicable to time - series data. - However, in experiments, most self - supervised time - series PTMs perform worse than simple supervised models. 2. **Hypothesis**: - The author hypothesizes that this unsatisfactory phenomenon may be caused by data scarcity. 3. **Solution**: - To verify this hypothesis, the author tests six time - series generation methods and uses the generated data for pre - training instead of using real data. - In this way, they hope to evaluate the impact of the generated data on time - series classification performance. 4. **Research Objectives**: - Explore whether pre - training with generated time - series data can improve the performance of time - series classification tasks. - Compare the combined effects of different generation methods and pre - training methods. ### Method Overview - **Generation Methods**: including Random Walk, Sinusoidal Wave, Multivariate Gaussian, Generative Adversarial Network (GAN), β - Variational Auto - Encoder (β - VAE) and Diffusion Model. - **Pre - training Methods**: including TimeCLR, TS2Vec, MixingUp and TF - C. - **Network Architectures**: including ResNet and Transformer. ### Main Findings - Pre - training with generated time - series data can significantly improve the performance of the model, especially in the case of scarce data. - Advanced generation models (such as GAN, β - VAE and Diffusion Model) perform better than simple generation models (such as Random Walk, Sinusoidal Wave and Multivariate Gaussian), but the difference is not significant. - The ResNet architecture performs better than the Transformer architecture in time - series classification tasks. ### Conclusion This paper systematically evaluates the generated time - series data and its application in self - supervised pre - training, and proves that the generated data can alleviate the data scarcity problem to a certain extent and improve the performance of time - series classification tasks.

A Systematic Evaluation of Generated Time Series and Their Effects in Self-Supervised Pretraining

Self-Supervised Pre-training for Time Series Classification

A Survey on Time-Series Pre-Trained Models

Self-Supervised Pretraining Improves Self-Supervised Pretraining

Pre-Trained Models: Past, Present and Future

Toward a Foundation Model for Time Series Data

One Fits All:Power General Time Series Analysis by Pretrained LM

Randomized 3D Scene Generation for Generalizable Self-Supervised Pre-Training

Generative Pretrained Hierarchical Transformer for Time Series Forecasting

PTUM: Pre-training User Model from Unlabeled User Behaviors Via Self-supervision.

A Comparative Study of Pre-training and Self-training

SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

Large Pre-trained time series models for cross-domain Time series analysis tasks

UniTS: A Universal Time Series Analysis Framework Powered by Self-Supervised Representation Learning

NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining

Rethinking self-supervised learning for time series forecasting: A temporal perspective

An Extensive Study on Pre-trained Models for Program Understanding and Generation

Unleash The Power of Pre-Trained Language Models for Irregularly Sampled Time Series

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

An Analysis of Unsupervised Pre-training in Light of Recent Advances

TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling