Time series representation and similarity based on local autopatterns

Mustafa Gokce Baydogan,George Runger
DOI: https://doi.org/10.1007/s10618-015-0425-y
IF: 5.406
2015-07-07
Data Mining and Knowledge Discovery
Abstract:Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the high - dimensional problem in time - series data mining and the problem of generating effective similarity measures. Specifically, the authors propose a new method to model the dependency structure in time - series. This method generalizes the concept of autoregression to local autopatterns. They introduce a similarity measure based on the learned patterns, called "Learned Pattern Similarity (LPS)". Through this method, the paper aims to provide a fast and competitive time - series representation and similarity measure method, which is suitable for various data mining tasks such as clustering, anomaly detection, and classification. ### Core contributions of the paper: 1. **Local Autopatterns**: The paper proposes a novel method for modeling the dependency structure in time - series. This method extends the concept of autoregression to local autopatterns. These patterns can capture non - linear dependency relationships in time - series, not just linear relationships. 2. **Learned Pattern Similarity (LPS)**: Based on the learned local autopatterns, the paper proposes a new similarity measure method - LPS. This similarity measure method can not only handle univariate time - series, but also can be naturally extended to multivariate time - series. 3. **Tree - based Ensemble Learning Strategy**: The paper uses a tree - based ensemble learning strategy, which is fast and insensitive to parameter settings. Through this method, LPS can effectively learn the complex dependency relationships in time - series and generate robust similarity measures. 4. **Unsupervised Method**: LPS is an unsupervised method, which avoids predefined features and feature extraction steps, which are common in some feature - based methods. 5. **Extension to Multivariate Time - Series**: LPS can be naturally extended to multivariate time - series, and can model the interactions between different attributes, thus providing a more comprehensive time - series representation. ### Application scenarios: - **Clustering**: The time - series representation generated by LPS can be used for clustering analysis to identify time - series with similar patterns. - **Anomaly Detection**: LPS can be used to detect abnormal patterns in time - series. By comparing the similarity between new observations and historical data, anomalies can be identified. - **Classification**: The similarity measure generated by LPS can be used for time - series classification tasks, improving the accuracy and efficiency of classification. ### Experimental Results: The paper conducted experiments on multiple benchmark datasets and compared LPS with 11 other known similarity measure methods. The experimental results show that LPS provides fast and competitive results on benchmark datasets in multiple fields. In conclusion, by introducing local autopatterns and learned pattern similarity, this paper provides a new time - series representation and similarity measure method, solves the high - dimensional problem in time - series data mining, and performs well in various data mining tasks.