On the Consistency of Exact and Approximate Nearest Neighbor with Noisy Data.
Wei Gao,Xin-Yi Niu,Zhi-Hua Zhou
2016-01-01
Abstract:Nearest neighbor has been one of the simplest and most appealing non-parametric approaches in machine learning, pattern recognition, computer vision, etc. Empirical studies show the resistance of k-nearest neighbor to noise, yet the theoretical understanding is not clear. This work presents the consistency analysis on exact and approximate nearest neighbor in the random noise setting. Our theoretical studies show that k-nearest neighbor, in the noise setting, gets the same consistent rate as that in the noise-free setting, which verifies the robustness of k-nearest neighbor to random noise. The nearest neighbor (1-NN), however, is proven to be biased by random noise. For approximate $k$-nearest neighbor, we first generalize the Johnson-Lindenstrauss lemma to infinite set, and based on this result, we show that the approximate $k$-nearest neighbor is also robust to random noise as that of the exact k-nearest neighbor, and achieves faster convergence rate yet with a tradeoff between consistency and reduced dimension. Specifically, approximate $k$-nearest neighbor with sharp dimensional reduction tends to cause large deviation from the Bayes risk.