CELOF: Effective and fast memory efficient local outlier detection in high-dimensional data streams
Liang Chen,Wei Wang,Yun Yang
DOI: https://doi.org/10.1016/j.asoc.2021.107079
IF: 8.7
2021-04-01
Applied Soft Computing
Abstract:<p>Outlier detection is an important and challenging problem in industrial automation, where data are often collected in large amounts but with little labeled information. To realize real-time outlier detection on data streams, many models have been proposed in the academic. However, most existing outlier detection algorithms still have two main limitations: (1) Need a large amount of memory to store data. (2) Poor detection of high-dimensional data in application scenarios. In this paper, we propose a new algorithm, called CELOF which can effectively overcome the two limitations. In CELOF, We first use information entropy to construct a new index weight calculation method, which can distinguish the influencing factors of different indexes and improve the detection accuracy of multi-dimensional data. Next, we designed a new reachable distance factor discrimination method to extract the original data information and then proposed a new strategy for outlier detection, which can greatly reduce the amount of data storage. Finally, the final experiment result shows that the CELOF algorithm has an average improvement of 15% in accuracy compared to the state-of-the-art algorithms, and the CELOF's running time less than 1% of the original LOF. Additionally, our comprehensive experiments use different real data sets for simulation, and the results show that our algorithm can be widely used in different practical application scenarios without any prior information and data distribution.</p>
computer science, artificial intelligence, interdisciplinary applications