Random Projection with Filtering for Nearly Duplicate Search

Yue Lin,Rong Jin,Deng Cai,Xiaofei He
DOI: https://doi.org/10.1609/aaai.v26i1.8199
2021-01-01
Proceedings of the AAAI Conference on Artificial Intelligence
Abstract:High dimensional nearest neighbor search is a fundamental problem and has found applications in many domains. Although many hashing based approaches have been proposed for approximate nearest neighbor search in high dimensional space, one main drawback is that they often return many false positives that need to be filtered out by a post procedure. We propose a novel method to address this limitation in this paper. The key idea is to introduce a filtering procedure within the search algorithm, based on the compressed sensing theory, that effectively removes the false positive answers. We first obtain a sparse representation for each data point by the landmark based approach, after which we solve the nearly duplicate search that the difference between the query and its nearest neighbors forms a sparse vector living in a small ℓp ball, where p ≤ 1. Our empirical study on real-world datasets demonstrates the effectiveness of the proposed approach compared to thestate-of-the-art hashing methods.
What problem does this paper attempt to address?