Embarrassingly Simple MixUp for Time-series

Karan Aggarwal,Jaideep Srivastava
2023-04-10
Abstract:Labeling time series data is an expensive task because of domain expertise and dynamic nature of the data. Hence, we often have to deal with limited labeled data settings. Data augmentation techniques have been successfully deployed in domains like computer vision to exploit the use of existing labeled data. We adapt one of the most commonly used technique called MixUp, in the time series domain. Our proposed, MixUp++ and LatentMixUp++, use simple modifications to perform interpolation in raw time series and classification model's latent space, respectively. We also extend these methods with semi-supervised learning to exploit unlabeled data. We observe significant improvements of 1\% - 15\% on time series classification on two public datasets, for both low labeled data as well as high labeled data regimes, with LatentMixUp++.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily addresses two core issues in the annotation process of time series data: 1. **High annotation cost**: Due to the dynamic and temporal nature of time series data, annotation requires domain experts, making the process very time-consuming and expensive. 2. **Limited annotated data**: Especially in fields like healthcare, precise annotation is crucial, which often results in only a limited amount of annotated data being available. To tackle these problems, the paper proposes a time series data augmentation method based on MixUp technology—MixUp++ and LatentMixUp++. These methods generate synthetic samples by interpolating data in both the original time and the latent space of classification models. They were validated on two public datasets (human activity recognition and sleep staging). Experimental results show that these methods significantly improve the performance of time series classification under both low and high annotation data scenarios. Notably, LatentMixUp++ performs exceptionally well under low annotation data conditions. Additionally, the paper extends these methods to a semi-supervised learning environment, further leveraging unannotated data through pseudo-labeling. This approach is particularly effective in low annotation data scenarios, significantly enhancing model performance.