DTWSSE: Data Augmentation with a Siamese Encoder for Time Series

Xinyu Yang,Xinlan Zhang,Zhenguo Zhang,Yahui Zhao,Rongyi Cui
DOI: https://doi.org/10.1007/978-3-030-85896-4_34
2021-08-23
Abstract:Access to labeled time series data is often limited in the real world, which constrains the performance of deep learning models in the field of time series analysis. Data augmentation is an effective way to solve the problem of small sample size and imbalance in time series datasets. The two key factors of data augmentation are the distance metric and the choice of interpolation method. SMOTE does not perform well on time series data because it uses a Euclidean distance metric and interpolates directly on the object. Therefore, we propose a DTW-based synthetic minority oversampling technique using siamese encoder for interpolation named DTWSSE. In order to reasonably measure the distance of the time series, DTW, which has been verified to be an effective method forts, is employed as the distance metric. To adapt the DTW metric, we use an autoencoder trained in an unsupervised self-training manner for interpolation. The encoder is a Siamese Neural Network for mapping the time series data from the DTW hidden space to the Euclidean deep feature space, and the decoder is used to map the deep feature space back to the DTW hidden space. We validate the proposed methods on a number of different balanced or unbalanced time series datasets. Experimental results show that the proposed method can lead to better performance of the downstream deep learning model.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in time - series analysis, due to the high cost of obtaining labeled time - series data and the imbalance of sample sizes in different categories, deep - learning models are prone to over - fitting or ignoring the minority class. To solve the problems of small samples and class imbalance, the paper proposes a time - series synthetic minority class over - sampling technique (DTWSSE) based on dynamic time warping (DTW) and Siamese Encoder to improve the time - series data augmentation method. ### Specific problem description 1. **Problems of small samples and class imbalance**: - In practical applications, the cost of obtaining labeled time - series data is relatively high. - There are large differences in sample sizes among different categories, which may lead to over - fitting or ignoring the minority class in the deep - learning model during the training process. 2. **Limitations of existing methods**: - Traditional data augmentation methods such as SMOTE use Euclidean distance measurement and direct interpolation, which cannot well represent the similarity between time - series and may destroy the internal time - correlation of time - series. ### Proposed solution The paper proposes the DTWSSE method, aiming to solve the above problems in the following ways: - **Distance measurement**: Use dynamic time warping (DTW) as the distance measurement to better capture the similarity between time - series. - **Interpolation method**: Use an auto - encoder for interpolation, where the encoder is a Siamese neural network used to map the time - series from the DTW latent space to the Euclidean deep - feature space; the decoder is used to map the deep - feature space back to the DTW latent space. In this way, DTWSSE can generate new synthetic samples while maintaining the time characteristics of time - series, thereby improving the performance of downstream deep - learning models. ### Mathematical formulas - **DTW distance calculation**: \[ D_{ij}=\|x_i^q - x_j^s\|_2^2 \] \[ d(X_q, X_s)=\sum_{t = 1}^P D_{i_t j_t} \] \[ DTW(X_q, X_s)=\min_W\left\{\sum_{t = 1}^P D_{i_t j_t}\right\} \] - **Cumulative distance calculation**: \[ dp(i,j)=D_{ij}+\min\{dp(i - 1,j), dp(i,j - 1), dp(i - 1,j - 1)\} \] - **Loss function**: - Encoder training loss: \[ L_E=\frac{1}{|D|}\sum_i\left(\|h_i^1 - h_i^2\|_2 - y_i\right)^2 \] - Decoder training loss: \[ L_D=\frac{1}{2|D|}\sum_i\left(\|S_i^1 - RecS_i^1\|_2^2\right)+\frac{1}{2|D|}\sum_i\left(\|S_i^2 - RecS_i^2\|_2^2\right) \] ### Summary By introducing the DTW distance measurement and the Siamese encoder interpolation method, this paper effectively solves the problems of small samples and class imbalance in time - series data augmentation and improves the performance of downstream deep - learning models.