Semi-supervised Anomaly Detection with Contamination-Resilience and Incremental Training

Liheng Yuan,Fanghua Ye,Heng Li,Chenhao Zhang,Cuiying Gao,Chengqing Yu,Wei Yuan,Xinge You
DOI: https://doi.org/10.1016/j.engappai.2024.109311
2024-01-01
Abstract:Anomaly detection plays a vital role in various realistic applications, including fraud detection, network traffic analysis, medical diagnosis, and so on. Semi-supervised anomaly detection methods have recently attracted increasing attention, owing to their low requirement for labeled anomalous samples. However, existing semi-supervised methods suffer from performance degradation when training data are contaminated with anomalies, and cannot well support incremental training required in scenarios where original training data are hard to obtain.To overcome these limitations, we propose SAE-CRIT, a lightweight semi-supervised anomaly detection method with contamination resilience and incremental training. SAE-CRIT effectively mitigates the negative impact of contaminated data through differentially weighting samples, and leverages a three-layer neural network to detect anomalies, allowing for efficient incremental training by updating only the last layer with new data. We compare SAE-CRIT with eight anomaly detection methods over four datasets. Extensive experiments demonstrate the advantages of SAE-CRIT in contamination resistance, incremental training, and training costs. More specifically, the state-of-the-art detection method GOAD achieved an F1-score of 89.3% and 90.6% on the contaminated datasets KDDCUP and KDDCUP-Rev, respectively. Under the same settings, however, SAE-CRIT exhibited an F1-score of 92.4% and 96.9%, respectively. In addition, the training time of SAE-CRIT is less than 20 s on these two datasets. The time spent by SAE-CRIT on these two datasets only accounts for 0.26% and 1.8% of the total time spent by GOAD, respectively.
What problem does this paper attempt to address?