Abstract:Benchmarking anomaly detection approaches for multivariate time series is challenging due to the lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a small selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the lack of high - quality datasets currently used to evaluate online unsupervised anomaly detection methods for multivariate time - series. The existing public datasets have the following problems: 1. **Small scale**: The sample size of the existing datasets is insufficient and cannot provide enough diversity. 2. **Lack of diversity**: The time - series features in the existing datasets are single and cannot cover complex real - world scenarios. 3. **Simple anomalies**: The anomalies in the existing datasets are too simple and cannot reflect the complexity in practical applications. These problems limit the substantial progress in this research field. Therefore, the author proposes a new solution: construct a diverse, extensive and non - trivial dataset (the PATH dataset), which is generated by the state - of - the - art simulation tools and reflects the multivariate, dynamic and variable - state characteristics of the automotive powertrain. Specifically, this dataset aims to solve the following problems: - **Lack of diversity**: Ensure the diversity of the dataset by introducing multiple driving cycles and random initial conditions (such as battery temperature and state of charge). - **Lack of complexity in anomalies**: Ensure the complexity and authenticity of anomalies by simulating six different types of anomalies (such as turning off regenerative braking, increasing headwind resistance, etc.). - **Online detection requirements**: Support unsupervised and semi - supervised anomaly detection settings, as well as time - series generation and prediction tasks by providing different versions of datasets containing contaminated and clean training subsets. ### Key contributions 1. **High - quality dataset**: Propose a new dataset named PATH, which has high complexity and realism and can better reflect practical application scenarios. 2. **Diverse anomaly types**: Introduce multiple anomaly types, including subsequence anomalies and full - sequence anomalies, to increase the challenge of the dataset. 3. **Baseline experiment results**: Provide baseline experiment results based on deterministic and variational auto - encoders and non - parametric methods to verify the effectiveness of the dataset. Through these improvements, this paper provides a more reliable and more challenging benchmark platform for the research of multivariate time - series anomaly detection.

A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series

Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions

TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data

Segmentation-Based Adversarial Denoising Auto-Encoder for Anomaly Detection in Multivariate Time Series Data

An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series

Multivariate Time Series Anomaly Detection with Few Positive Samples

Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization

MA-VAE: Multi-head Attention-based Variational Autoencoder Approach for Anomaly Detection in Multivariate Time-series Applied to Automotive Endurance Powertrain Testing

Low-count Time Series Anomaly Detection

Unsupervised Anomaly Detection in Time-series: An Extensive Evaluation and Analysis of State-of-the-art Methods

Intrinsic Anomaly Detection for Multi-Variate Time Series

Implementation of a Sequence-to-Sequence Stacked Sparse Long Short-Term Memory Autoencoder for Anomaly Detection on Multivariate Timeseries Data of Industrial Blower Ball Bearing Units

Detection of Anomalies in Multivariate Time Series Using Ensemble Techniques

AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series Data

An Extreme Learning Machine for Unsupervised Online Anomaly Detection in Multivariate Time Series

An Empirical Analysis of Anomaly Detection Methods for Multivariate Time Series

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

HybridAD: A Hybrid Model-Driven Anomaly Detection Approach for Multivariate Time Series

Unsupervised Model Selection for Time-series Anomaly Detection

Model-Free Unsupervised Anomaly detection framework in multivariate time-series of industrial dynamical systems

Unsupervised Anomaly Detection for IoT-Based Multivariate Time Series: Existing Solutions, Performance Analysis and Future Directions