Abstract:Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approaches directly train and detect on unlabeled contaminated datasets, leading to the need for methods that are robust to such conditions. Ensemble methods emerged as a superior solution to enhance model robustness against contaminated training sets. However, the training time is greatly increased by the ensemble. In this study, we investigate the impact of outliers on the training phase, aiming to halt training on unlabeled contaminated datasets before performance degradation. Initially, we noted that blending normal and anomalous data causes AUC fluctuations, a label-dependent measure of detection accuracy. To circumvent the need for labels, we propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. Meanwhile, we theoretically demonstrate negative correlation between entropy metric and the label-based AUC. Based on this, we develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability. We conduct extensive experiments on ADBench (including 47 real datasets), and the overall results indicate that AutoEncoder (AE) enhanced by our approach not only achieves better performance than ensemble AEs but also requires under 2\% of training time. Lastly, our proposed metric and early-stopping approach are evaluated on other deep OD models, exhibiting their broad potential applicability.

Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models

Learning Discrimination from Contaminated Data: Multi-Instance Learning for Unsupervised Anomaly Detection

Fairness-aware Outlier Ensemble

Research of Anomaly detection based on Dynamic Anomaly Detection Enhancement Framework

Robust and accurate performance anomaly detection and prediction for cloud applications: a novel ensemble learning-based framework

Sparse Modeling-Based Sequential Ensemble Learning for Effective Outlier Detection in High-Dimensional Numeric Data.

Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics

ESAD: End-to-end Semi-supervised Anomaly Detection

Selective ensemble method for anomaly detection based on parallel learning

Multi-Scale Time Series Ensemble Learning for Information System Anomaly Detection

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

Outlier detection using conditional information entropy and rough set theory

RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies

Unsupervised Model Selection for Time-series Anomaly Detection

EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies

Using Ensemble Classifiers to Detect Incipient Anomalies

Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification

Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning

Revisiting Deep Ensemble Uncertainty for Enhanced Medical Anomaly Detection

Insider Threat Detection Model Enhancement Using Hybrid Algorithms between Unsupervised and Supervised Learning