Abstract:Access to labeled time series data is often limited in the real world, which constrains the performance of deep learning models in the field of time series analysis. Data augmentation is an effective way to solve the problem of small sample size and imbalance in time series datasets. The two key factors of data augmentation are the distance metric and the choice of interpolation method. SMOTE does not perform well on time series data because it uses a Euclidean distance metric and interpolates directly on the object. Therefore, we propose a DTW-based synthetic minority oversampling technique using siamese encoder for interpolation named DTWSSE. In order to reasonably measure the distance of the time series, DTW, which has been verified to be an effective method forts, is employed as the distance metric. To adapt the DTW metric, we use an autoencoder trained in an unsupervised self-training manner for interpolation. The encoder is a Siamese Neural Network for mapping the time series data from the DTW hidden space to the Euclidean deep feature space, and the decoder is used to map the deep feature space back to the DTW hidden space. We validate the proposed methods on a number of different balanced or unbalanced time series datasets. Experimental results show that the proposed method can lead to better performance of the downstream deep learning model.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in time - series analysis, due to the high cost of obtaining labeled time - series data and the imbalance of sample sizes in different categories, deep - learning models are prone to over - fitting or ignoring the minority class. To solve the problems of small samples and class imbalance, the paper proposes a time - series synthetic minority class over - sampling technique (DTWSSE) based on dynamic time warping (DTW) and Siamese Encoder to improve the time - series data augmentation method. ### Specific problem description 1. **Problems of small samples and class imbalance**: - In practical applications, the cost of obtaining labeled time - series data is relatively high. - There are large differences in sample sizes among different categories, which may lead to over - fitting or ignoring the minority class in the deep - learning model during the training process. 2. **Limitations of existing methods**: - Traditional data augmentation methods such as SMOTE use Euclidean distance measurement and direct interpolation, which cannot well represent the similarity between time - series and may destroy the internal time - correlation of time - series. ### Proposed solution The paper proposes the DTWSSE method, aiming to solve the above problems in the following ways: - **Distance measurement**: Use dynamic time warping (DTW) as the distance measurement to better capture the similarity between time - series. - **Interpolation method**: Use an auto - encoder for interpolation, where the encoder is a Siamese neural network used to map the time - series from the DTW latent space to the Euclidean deep - feature space; the decoder is used to map the deep - feature space back to the DTW latent space. In this way, DTWSSE can generate new synthetic samples while maintaining the time characteristics of time - series, thereby improving the performance of downstream deep - learning models. ### Mathematical formulas - **DTW distance calculation**: \[ D_{ij}=\|x_i^q - x_j^s\|_2^2 \] \[ d(X_q, X_s)=\sum_{t = 1}^P D_{i_t j_t} \] \[ DTW(X_q, X_s)=\min_W\left\{\sum_{t = 1}^P D_{i_t j_t}\right\} \] - **Cumulative distance calculation**: \[ dp(i,j)=D_{ij}+\min\{dp(i - 1,j), dp(i,j - 1), dp(i - 1,j - 1)\} \] - **Loss function**: - Encoder training loss: \[ L_E=\frac{1}{|D|}\sum_i\left(\|h_i^1 - h_i^2\|_2 - y_i\right)^2 \] - Decoder training loss: \[ L_D=\frac{1}{2|D|}\sum_i\left(\|S_i^1 - RecS_i^1\|_2^2\right)+\frac{1}{2|D|}\sum_i\left(\|S_i^2 - RecS_i^2\|_2^2\right) \] ### Summary By introducing the DTW distance measurement and the Siamese encoder interpolation method, this paper effectively solves the problems of small samples and class imbalance in time - series data augmentation and improves the performance of downstream deep - learning models.

DTWSSE: Data Augmentation with a Siamese Encoder for Time Series

Integrating Data-Driven Segmentation, Local Feature Extraction and Fisher Kernel Encoding to Improve Time Series Classification

DTW-Merge: A Novel Data Augmentation Technique for Time Series Classification

Time Series Data Augmentation for Deep Learning: A Survey

Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher

D3A-TS: Denoising-Driven Data Augmentation in Time Series

TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

Few-shot Learning using Data Augmentation and Time-Frequency Transformation for Time Series Classification

A Time-Series-Based Sample Amplification Model for Data Stream with Sparse Samples

Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey

SFCC: Data Augmentation with Stratified Fourier Coefficients Combination for Time Series Classification

A Wave is Worth 100 Words: Investigating Cross-Domain Transferability in Time Series

Dominant Shuffle: A Simple Yet Powerful Data Augmentation for Time-series Prediction

Data Augmentation techniques in time series domain: a survey and taxonomy

Towards Diverse and Coherent Augmentation for Time-Series Forecasting

Ensemble Augmentation for Deep Neural Networks Using 1-D Time Series Vibration Data

Your time series is worth a binary image: machine vision assisted deep framework for time series forecasting

Data Augmentation for Short-Term Time Series Prediction with Deep Learning

Imaging Time-Series to Improve Classification and Imputation

Time Series Data Imputation: A Survey on Deep Learning Approaches

Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting