Matrix Profile for Anomaly Detection on Multidimensional Time Series

Chin-Chia Michael Yeh,Audrey Der,Uday Singh Saini,Vivian Lai,Yan Zheng,Junpeng Wang,Xin Dai,Zhongfang Zhuang,Yujie Fan,Huiyuan Chen,Prince Osei Aboagye,Liang Wang,Wei Zhang,Eamonn Keogh
2024-09-14
Abstract:The Matrix Profile (MP), a versatile tool for time series data mining, has been shown effective in time series anomaly detection (TSAD). This paper delves into the problem of anomaly detection in multidimensional time series, a common occurrence in real-world applications. For instance, in a manufacturing factory, multiple sensors installed across the site collect time-varying data for analysis. The Matrix Profile, named for its role in profiling the matrix storing pairwise distance between subsequences of univariate time series, becomes complex in multidimensional scenarios. If the input univariate time series has n subsequences, the pairwise distance matrix is a n x n matrix. In a multidimensional time series with d dimensions, the pairwise distance information must be stored in a n x n x d tensor. In this paper, we first analyze different strategies for condensing this tensor into a profile vector. We then investigate the potential of extending the MP to efficiently find k-nearest neighbors for anomaly detection. Finally, we benchmark the multidimensional MP against 19 baseline methods on 119 multidimensional TSAD datasets. The experiments covers three learning setups: unsupervised, supervised, and semi-supervised. MP is the only method that consistently delivers high performance across all setups.
Machine Learning,Artificial Intelligence,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform anomaly detection in multi - dimensional time series. Specifically, the paper explores how to effectively apply the Matrix Profile (MP) technique in multi - dimensional time series data to detect abnormal patterns. Multi - dimensional time series are very common in the real world. For example, in manufacturing plants, multiple sensors will collect data that changes over time for analysis. However, compared with one - dimensional time series, anomaly detection in multi - dimensional time series is more complex because abnormal patterns usually only appear in a few dimensions, not all of them. This leads to the fact that the distances of each dimension cannot be simply added up to detect anomalies, because this will drown the abnormal patterns in a large number of normal patterns. ### Main contributions of the paper 1. **Construction of multi - dimensional matrix profiles**: - The paper first analyzes different strategies to compress the pairwise distance tensor of multi - dimensional time series into a profile vector. - Two main strategies are proposed: post - sorting and pre - sorting. These two strategies perform sorting after and before finding the nearest neighbor respectively to determine the most abnormal dimension. 2. **Extension of matrix profiles for efficient k - nearest - neighbor lookup**: - In order to improve the performance of anomaly detection, the paper extends the MP technique so that it can efficiently find the k - th nearest neighbor, not just the nearest neighbor. This improvement helps to deal with repeatedly occurring abnormal patterns. 3. **Benchmarking**: - The paper benchmarks multi - dimensional MP on 119 multi - dimensional time series anomaly detection data sets and compares it with 19 baseline methods. The experiments cover three learning settings: unsupervised, supervised, and semi - supervised. The results show that multi - dimensional MP can maintain high performance in all settings. ### Key technical details - **Calculation of multi - dimensional matrix profiles**: - **Post - sorting**: Sort after finding the nearest neighbor in each dimension. The time complexity is \(O(n_1 d \log d)\). - **Pre - sorting**: Sort before finding the nearest neighbor. The time complexity is \(O(n_1 n_2 d \log d)\). - **Max operation**: It can replace the sorting operation and further reduce the time complexity. - **k - nearest - neighbor lookup algorithm**: - The paper proposes an efficient k - nearest - neighbor selection algorithm, taking into account the situation of trivial matches. The time complexity of this algorithm is \(O(n_1 n_2 d)\), which is better than traditional brute - force search and sorting methods. ### Experimental results - **Performance comparison**: - Multi - dimensional MP performs well in all three learning settings: unsupervised, supervised, and semi - supervised. Especially when dealing with anomalies caused by changes in cross - dimensional correlations, the pre - sorting strategy performs better. - In terms of actual running time, the running times of the post - sorting and max operation strategies are close, while the pre - sorting strategy is relatively slow. ### Conclusion This paper significantly improves the performance of anomaly detection in multi - dimensional time series by introducing multi - dimensional matrix profiles and an efficient k - nearest - neighbor lookup algorithm. These methods perform well in multiple learning settings and provide a powerful tool for anomaly detection in multi - dimensional time series.