Combining KNN with AutoEncoder for Outlier Detection
Shu-Zheng Liu,Shuai Ma,Han-Qing Chen,Li-Zhen Cui,Jie Ding
DOI: https://doi.org/10.1007/s11390-023-2403-y
IF: 1.871
2024-12-08
Journal of Computer Science and Technology
Abstract:K -nearest neighbor ( K NN) is one of the most fundamental methods for unsupervised outlier detection because of its various advantages, e.g., ease of use and relatively high accuracy. Currently, most data analytic tasks need to deal with high-dimensional data, and the K NN-based methods often fail due to "the curse of dimensionality". AutoEncoder-based methods have recently been introduced to use reconstruction errors for outlier detection on high-dimensional data, but the direct use of AutoEncoder typically does not preserve the data proximity relationships well for outlier detection. In this study, we propose to combine K NN with AutoEncoder for outlier detection. First, we propose the Nearest Neighbor AutoEncoder (NNAE) by persevering the original data proximity in a much lower dimension that is more suitable for performing K NN. Second, we propose the K -nearest reconstruction neighbors ( K NRNs) by incorporating the reconstruction errors of NNAE with the K -distances of K NN to detect outliers. Third, we develop a method to automatically choose better parameters for optimizing the structure of NNAE. Finally, using five real-world datasets, we experimentally show that our proposed approach NNAE+ K NRN is much better than existing methods, i.e., K NN, Isolation Forest, a traditional AutoEncoder using reconstruction errors (AutoEncoder-RE), and Robust AutoEncoder.
computer science, software engineering, hardware & architecture