Real-time distance-based outlier detection in data streams
Luan Tran,Min Y. Mun,Cyrus Shahabi
DOI: https://doi.org/10.14778/3425879.3425885
IF: 2.5
2020-10-01
Proceedings of the VLDB Endowment
Abstract:Real-time outlier detection in data streams has drawn much attention recently as many applications need to be able to detect abnormal behaviors as soon as they occur. The arrival and departure of streaming data on edge devices impose new challenges to process the data quickly in real-time due to memory and CPU limitations of these devices. Existing methods are slow and not memory efficient as they mostly focus on quick detection of inliers and pay less attention to expediting neighbor searches for outlier candidates. In this study, we propose a new algorithm, CPOD, to improve the efficiency of outlier detections while reducing its memory requirements. CPOD uses a unique data structure called "core point" with multi-distance indexing to both quickly identify inliers and reduce neighbor search spaces for outlier candidates. We show that with six real-world and one synthetic dataset, CPOD is, on average, 10, 19, and 73 times faster than M_MCOD, NETS, and MCOD, respectively, while consuming low memory.
computer science, information systems, theory & methods