Developing Big Data Anomaly Dynamic and Static Detection Algorithms: AnomalyDSD Spark Package

Diego García-Gil,David López,Daniel Argüelles-Martino,Jacinto Carrasco,Ignacio Aguilera-Martos,Julián Luengo,Francisco Herrera
DOI: https://doi.org/10.1016/j.ins.2024.121587
IF: 8.1
2024-10-29
Information Sciences
Abstract:Background Anomaly detection is the process of identifying observations that differ greatly from the majority of data. Unsupervised anomaly detection aims to find outliers in data that is not labeled, therefore, the anomalous instances are unknown. The exponential data generation has led to the era of Big Data. This scenario brings new challenges to classic anomaly detection problems due to the massive and unsupervised accumulation of data. Traditional methods are not able to cop up with computing and time requirements of Big Data problems. Methods In this paper, we propose four distributed algorithm designs for Big Data anomaly detection problems: HBOS_BD, LODA_BD, LSCP_BD, and XGBOD_BD. They have been designed following the MapReduce distributed methodology in order to be capable of handling Big Data problems. Results These algorithms have been integrated into an Spark Package, focused on static and dynamic Big Data anomaly detection tasks, namely AnomalyDSD. Experiments using a real-world case of study have shown the performance and validity of the proposals for Big Data problems. Conclusions With this proposal, we have enabled the practitioner to efficiently and effectively detect anomalies in Big Data datasets, where the early detection of an anomaly can lead to a proper and timely decision.
computer science, information systems
What problem does this paper attempt to address?