Unsupervised Anomaly Detection in Time-series: An Extensive Evaluation and Analysis of State-of-the-art Methods

Nesryne Mejri,Laura Lopez-Fuentes,Kankana Roy,Pavel Chernakov,Enjie Ghorbel,Djamila Aouada
DOI: https://doi.org/10.1016/j.eswa.2024.124922
2024-08-12
Abstract:Unsupervised anomaly detection in time-series has been extensively investigated in the literature. Notwithstanding the relevance of this topic in numerous application fields, a comprehensive and extensive evaluation of recent state-of-the-art techniques taking into account real-world constraints is still needed. Some efforts have been made to compare existing unsupervised time-series anomaly detection methods rigorously. However, only standard performance metrics, namely precision, recall, and F1-score are usually considered. Essential aspects for assessing their practical relevance are therefore neglected. This paper proposes an in-depth evaluation study of recent unsupervised anomaly detection techniques in time-series. Instead of relying solely on standard performance metrics, additional yet informative metrics and protocols are taken into account. In particular, (i) more elaborate performance metrics specifically tailored for time-series are used; (ii) the model size and the model stability are studied; (iii) an analysis of the tested approaches with respect to the anomaly type is provided; and (iv) a clear and unique protocol is followed for all experiments. Overall, this extensive analysis aims to assess the maturity of state-of-the-art time-series anomaly detection, give insights regarding their applicability under real-world setups and provide to the community a more complete evaluation protocol.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficient evaluation of unsupervised anomaly detection methods for time series in practical applications. Although there has been a great deal of research in this area, the existing evaluation work mainly focuses on standard performance metrics (such as precision, recall, and F1 - score), while ignoring other aspects that are crucial for practical applications. Specifically, this paper aims to: 1. **Model size and stability**: Existing methods often overlook the size and stability of the model, both of which are crucial for the model's scalability and performance stability. A stable model refers to one that performs stably in different training trials. 2. **Unified experimental protocol**: Currently, there is a lack of a clear experimental protocol for evaluating the state - of - the - art methods, resulting in large differences in experimental results among different studies. For example, a specific evaluation protocol called "Point Adjustment (PA)" introduced by Xu et al. (2018) is widely used in some studies but ignored in others. 3. **Time - series performance metrics**: Standard performance metrics (such as precision, recall, and F1 - score) may not be entirely suitable for evaluating time - series anomaly detectors. These metrics were originally designed for time - independent prediction rather than range - based prediction. Tatbul et al. (2018) proposed extended performance metrics for time series, but the current state - of - the - art methods do not consider these newer evaluation criteria. 4. **Experimental analysis by anomaly type**: Current research lacks detailed experimental analysis of different types of anomalies. Although some work has been done to strictly define different anomaly types in time series, no detailed experimental analysis has been carried out yet. 5. **Comparison with traditional machine - learning methods**: Works like those of Wu and Keogh (2021) and Audibert et al. (2022) emphasize the importance of comparing traditional machine - learning strategies with deep - learning methods. Some recent studies tend to focus on deep - learning methods while ignoring traditional machine - learning techniques. By solving the above problems, this paper aims to provide a comprehensive evaluation study, covering the latest unsupervised deep - learning techniques, and evaluating the practical application value of these methods from multiple perspectives. This will not only help the community better understand the advantages and limitations of the state - of - the - art techniques but also lay the foundation for future experimental evaluation practices.