k-NNN: Nearest Neighbors of Neighbors for Anomaly Detection

Ori Nizan,Ayellet Tal
2023-05-28
Abstract:Anomaly detection aims at identifying images that deviate significantly from the norm. We focus on algorithms that embed the normal training examples in space and when given a test image, detect anomalies based on the features distance to the k-nearest training neighbors. We propose a new operator that takes into account the varying structure & importance of the features in the embedding space. Interestingly, this is done by taking into account not only the nearest neighbors, but also the neighbors of these neighbors (k-NNN). We show that by simply replacing the nearest neighbor component in existing algorithms by our k-NNN operator, while leaving the rest of the algorithms untouched, each algorithms own results are improved. This is the case both for common homogeneous datasets, such as flowers or nuts of a specific type, as well as for more diverse datasets
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of the traditional k - nearest neighbor (k - NN) method in dealing with feature space structure and feature importance in anomaly detection. Specifically, when the normal data set is diverse or homogeneous, the traditional k - NN method may not be able to effectively distinguish between abnormal points and normal points. For example, in a scenario where the normal data set is diverse, a sample that is not similar to any normal sample may be misidentified as normal because its distance to the nearest neighbor is less than the distance between normal samples in different regions. In addition, in the case of a homogeneous normal data set, abnormal points may be very close to normal points in certain directions, making them difficult to detect. To overcome these problems, the paper proposes a new operator - the neighbors of k - nearest neighbors (k - NNN). This operator not only considers the nearest neighbors but also the neighbors of these neighbors, thus better reflecting the structure of the feature space and the importance of features. In this way, k - NNN can improve the performance of existing anomaly detection algorithms on different data sets, whether the data sets are homogeneous or diverse. ### Main contributions of the paper: 1. **Introduced a new general, efficient and accurate operator** - the k - NNN operator, which provides a way to view data between local and global perspectives, and is applicable to both diverse and homogeneous normal data sets. 2. **Proposed a new normalization scheme** that assigns greater weights to smaller feature values to address the challenges posed by small data sets. ### Method overview: - **Feature extraction**: In the training phase, feature vectors are extracted from each training image, and the k nearest neighbors of each feature vector are calculated. - **Calculation of feature vectors and feature values**: For each neighbor, its feature vector and feature value are calculated and this information is stored. - **Calculation of anomaly scores**: In the inference phase, given a test point and its feature vector, its k nearest neighbors are found, and the anomaly score is calculated based on the feature vectors and feature values of these neighbors. The formula for calculating the anomaly score is as follows: \[ AS(f)=\sum_{i = 1}^{k}\sum_{j = 1}^{n}\left|(f - f_{i})\cdot v_{ij}\right|\cdot\frac{1}{\sqrt{e_{ij}}} \] where \(f\) is the feature vector of the test point, \(f_{i}\) is the feature vector of the \(i\) - th nearest neighbor, and \(v_{ij}\) and \(e_{ij}\) are the \(j\) - th feature vector and feature value of the \(i\) - th neighbor, respectively. - **Feature segmentation and re - ordering**: To more effectively estimate feature vectors, the paper proposes to divide the feature vector into multiple parts and re - order them based on the correlation between features to ensure that each part contains the most relevant features. ### Experimental results: - **Improving existing anomaly detection methods**: The paper conducted experiments on multiple data sets, and the results show that replacing the existing k - NN component with the k - NNN operator can significantly improve the performance of anomaly detection. - **Synthetic benchmark tests**: On synthetic data sets, the k - NNN method outperforms other variants of the k - NN method, especially when dealing with structured feature space. ### Conclusion: The k - NNN operator proposed in the paper effectively solves the limitations of the traditional k - NN method in anomaly detection by considering the structure of the feature space and the importance of features, and improves the performance of multiple anomaly detection algorithms.