Abstract:Unsupervised anomaly detection in time-series has been extensively investigated in the literature. Notwithstanding the relevance of this topic in numerous application fields, a comprehensive and extensive evaluation of recent state-of-the-art techniques taking into account real-world constraints is still needed. Some efforts have been made to compare existing unsupervised time-series anomaly detection methods rigorously. However, only standard performance metrics, namely precision, recall, and F1-score are usually considered. Essential aspects for assessing their practical relevance are therefore neglected. This paper proposes an in-depth evaluation study of recent unsupervised anomaly detection techniques in time-series. Instead of relying solely on standard performance metrics, additional yet informative metrics and protocols are taken into account. In particular, (i) more elaborate performance metrics specifically tailored for time-series are used; (ii) the model size and the model stability are studied; (iii) an analysis of the tested approaches with respect to the anomaly type is provided; and (iv) a clear and unique protocol is followed for all experiments. Overall, this extensive analysis aims to assess the maturity of state-of-the-art time-series anomaly detection, give insights regarding their applicability under real-world setups and provide to the community a more complete evaluation protocol.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the insufficient evaluation of unsupervised anomaly detection methods for time series in practical applications. Although there has been a great deal of research in this area, the existing evaluation work mainly focuses on standard performance metrics (such as precision, recall, and F1 - score), while ignoring other aspects that are crucial for practical applications. Specifically, this paper aims to: 1. **Model size and stability**: Existing methods often overlook the size and stability of the model, both of which are crucial for the model's scalability and performance stability. A stable model refers to one that performs stably in different training trials. 2. **Unified experimental protocol**: Currently, there is a lack of a clear experimental protocol for evaluating the state - of - the - art methods, resulting in large differences in experimental results among different studies. For example, a specific evaluation protocol called "Point Adjustment (PA)" introduced by Xu et al. (2018) is widely used in some studies but ignored in others. 3. **Time - series performance metrics**: Standard performance metrics (such as precision, recall, and F1 - score) may not be entirely suitable for evaluating time - series anomaly detectors. These metrics were originally designed for time - independent prediction rather than range - based prediction. Tatbul et al. (2018) proposed extended performance metrics for time series, but the current state - of - the - art methods do not consider these newer evaluation criteria. 4. **Experimental analysis by anomaly type**: Current research lacks detailed experimental analysis of different types of anomalies. Although some work has been done to strictly define different anomaly types in time series, no detailed experimental analysis has been carried out yet. 5. **Comparison with traditional machine - learning methods**: Works like those of Wu and Keogh (2021) and Audibert et al. (2022) emphasize the importance of comparing traditional machine - learning strategies with deep - learning methods. Some recent studies tend to focus on deep - learning methods while ignoring traditional machine - learning techniques. By solving the above problems, this paper aims to provide a comprehensive evaluation study, covering the latest unsupervised deep - learning techniques, and evaluating the practical application value of these methods from multiple perspectives. This will not only help the community better understand the advantages and limitations of the state - of - the - art techniques but also lay the foundation for future experimental evaluation practices.

Unsupervised Anomaly Detection in Time-series: An Extensive Evaluation and Analysis of State-of-the-art Methods

A Comparative Study on Unsupervised Anomaly Detection for Time Series: Experiments and Analysis

Unsupervised Model Selection for Time-series Anomaly Detection

An Experimental Evaluation of Anomaly Detection in Time Series

An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series

Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art

Unsupervised Anomaly Detection for IoT-Based Multivariate Time Series: Existing Solutions, Performance Analysis and Future Directions

Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions

An Unsupervised Anomaly Detection Algorithm for Time Series Big Data

Experimental Comparison and Survey of Twelve Time Series Anomaly Detection Algorithms

Is it worth it? Comparing six deep and classical methods for unsupervised anomaly detection in time series

A novel unsupervised framework for time series data anomaly detection via spectrum decomposition

An Enhancing Timeseries Anomaly Detection Using LSTM and Bi-LSTM Architectures

Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?

Developing an Unsupervised Real-Time Anomaly Detection Scheme for Time Series with Multi-Seasonality

Multivariate Time Series Anomaly Detection: Fancy Algorithms and Flawed Evaluation Methodology

Local Evaluation of Time Series Anomaly Detection Algorithms

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data

Dive into Time-Series Anomaly Detection: A Decade Review

Unsupervised Anomaly Detection for Time Series with Outlier Exposure.

Navigating the Metric Maze: A Taxonomy of Evaluation Metrics for Anomaly Detection in Time Series