SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation

Hyun Ryu,Sunjae Yoon,Hee Suk Yoon,Eunseop Yoon,Chang D. Yoo
2023-12-10
Abstract:Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency domain. To address this issue, we propose a simple strategy to preserve spectral information (SimPSI) in time series data augmentation. SimPSI preserves the spectral information by mixing the original and augmented input spectrum weighted by a preservation map, which indicates the importance score of each frequency. Specifically, our experimental contributions are to build three distinct preservation maps: magnitude spectrum, saliency map, and spectrum-preservative map. We apply SimPSI to various time series data augmentations and evaluate its effectiveness across a wide range of time series benchmarks. Our experimental results support that SimPSI considerably enhances the performance of time series data augmentations by preserving core spectral information. The source code used in the paper is available at <a class="link-external link-https" href="https://github.com/Hyun-Ryu/simpsi" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Current time - series data augmentation techniques are effective in some specific tasks, but have poor generalization performance in time - series benchmark tests. Specifically, these augmentation techniques destroy the core information of the original data in the frequency domain, leading to a decline in model performance. To solve this problem, the author proposes a simple strategy, SimPSI (Spectrum - Preservative Information), which aims to preserve the core information in the frequency domain by weighted mixing of the original spectrum and the augmented spectrum. ### Background of the Paper Time - series data plays a crucial role in fields such as medicine, physiology, and sensor devices. However, due to the limited collection of different types of data samples, this restricts the performance and capabilities of neural networks. For this reason, researchers increase the number of samples through data augmentation to improve the performance of the model. Existing time - series data augmentation methods include Jittering, Scaling, Magnitude warping, Time warping, Permutation, etc., but the performance of these methods in different tasks is not consistent, and they often introduce biases in the frequency domain, resulting in the loss of core information. ### Proposal of SimPSI To overcome the limitations of existing methods, the author proposes SimPSI (Spectrum - Preservative Information). The core idea of SimPSI is to weighted - mix the original spectrum and the augmented spectrum through a "preservation map" to ensure that important information in the frequency domain is preserved. The specific steps are as follows: 1. **Spectrum Transformation**: Transform the original time - series \(x_t\) and the augmented time - series \(x'_t\) into spectra \(x_f\) and \(x'_f\) respectively through the Fast Fourier Transform (FFT). 2. **Define the Preservation Map**: Generate a preservation map \(P\) with the same length as the spectrum, where the importance score of each frequency component is between 0 and 1. 3. **Spectrum Mixing**: Weighted - mix the original spectrum and the augmented spectrum according to the preservation map \(P\) to obtain the information - preserved spectrum \(\tilde{x}_f\): \[ \tilde{x}_f=(1_C\cdot P^T)\odot x_f+(1_C\cdot(1_L - P)^T)\odot x'_f \] 4. **Inverse Transformation**: Transform \(\tilde{x}_f\) back to the time domain through the Inverse Fast Fourier Transform (IFFT) to obtain the final augmented time - series \(\tilde{x}_t\). ### Definition of the Preservation Map The paper proposes three different methods for generating the preservation map: 1. **Magnitude Spectrum**: Assume that frequency components with larger magnitudes are more important, calculate the magnitude of the input spectrum and perform normalization processing. 2. **Saliency Map**: Based on the absolute value of the gradient of the input spectrum by the classifier, determine which frequency components are more important for the task. 3. **Spectrum - Preservative Map**: Learn to generate the preservation map through a generator network, which is trained based on spectrum data and optimized through a contrast loss function. ### Experimental Results The experimental results show that SimPSI can significantly improve the effect of time - series data augmentation, especially in tasks such as signal demodulation, human activity recognition, and sleep - stage detection. By preserving the core information in the frequency domain, SimPSI effectively prevents unintentional information loss, thereby improving the performance of the model. In conclusion, the main contribution of this paper is to propose a simple and effective strategy, SimPSI, for preserving the core information in the frequency domain during the time - series data augmentation process, thereby improving the generalization ability and performance of the model.