Gaussian processes based data augmentation and expected signature for time series classification

Marco Romito,Francesco Triggiano
2023-10-17
Abstract:The signature is a fundamental object that describes paths (that is, continuous functions from an interval to a Euclidean space). Likewise, the expected signature provides a statistical description of the law of stochastic processes. We propose a feature extraction model for time series built upon the expected signature. This is computed through a Gaussian processes based data augmentation. One of the main features is that an optimal feature extraction is learnt through the supervised task that uses the model.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the accuracy and robustness of time - series classification tasks, especially in cases where the statistical features of time - series need to be captured. Specifically, the author proposes a new data augmentation method based on Gaussian Processes (GP) and a time - series feature extraction model of Expected Signature. ### Main Problems and Solutions 1. **Capturing the Statistical Features of Time - Series**: - Traditional time - series classification methods often only focus on the specific values of time - series and ignore their statistical properties. By introducing Expected Signature, this paper aims to describe the distribution properties of time - series from a statistical perspective, so as to better capture the essential features of time - series. - Expected Signature can effectively represent the distribution law of random processes and can theoretically uniquely determine the statistical properties of some random processes. 2. **Combination of Data Augmentation and Feature Extraction**: - In order to generate more training samples and improve the generalization ability of the model, the author proposes a data augmentation method based on Gaussian Process Regression. This method not only increases the amount of data but also ensures that the generated time - series have similar statistical properties to the original data. - The samples after data augmentation are used to calculate the Expected Signature, further enhancing the model's ability to capture the features of time - series. 3. **Optimizing Feature Extraction**: - The model optimizes the calculation process of Expected Signature through supervised learning tasks. Specifically, the model learns the optimal feature extraction method in the classification task, so that feature extraction is closely combined with the classification task, improving the overall performance. 4. **Normalization Processing**: - The normalization of Signature is a key step to ensure that Expected Signature can effectively represent the distribution law of random processes. The paper emphasizes the importance of normalization and points out that inappropriate normalization may lead to model instability or performance degradation. ### Overview of Model Architecture - **Data Augmentation Module**: Generate multiple new time - series samples based on Gaussian Process Regression. - **Expected Signature Calculation Module**: Calculate the Expected Signature for the augmented samples and perform normalization processing. - **Classification Module**: Use the softmax layer to classify the calculated Expected Signature. ### Experimental Results The author conducted experiments on multiple real and synthetic datasets to verify the effectiveness of the proposed method. The experimental results show that this method performs excellently in time - series classification tasks, especially having an obvious advantage in capturing the statistical features of time - series. In conclusion, this paper proposes an innovative time - series classification model. By combining Gaussian Process data augmentation and Expected Signature feature extraction, it significantly improves the classification accuracy and robustness of the model.