Neighborhood Voting: A Novel Search Scheme for Hashing.
Yan Xiao,Jiafeng Guo,Yanyan Lan,Jun Xu,Xueqi Cheng
DOI: https://doi.org/10.1145/3269206.3269240
2018-01-01
Abstract:Hashing techniques for approximate nearest neighbor search (ANNS) encode data points into a set of short binary codes, while trying to preserve the neighborhood structure of the original data as much as possible. With the binary codes, the task of ANNS can be easily conducted over large-scale dataset, due to the high efficiency of pairwise comparison with the Hamming distance. Although binary codes have low computation and storage cost, the data are heavily compressed so that partial neighborhood structure information would be inevitably lost. To address this issue, we propose to introduce the k-nearest neighbors (k-NNs) in the original space into the Hamming space (i.e., associating a binary code with its original k-NNs) to enhance the effectiveness of existing hashing techniques with little overhead. Based on this idea, we develop a novel search scheme for hashing techniques namely neighborhood voting, i.e., each point retrieved by a query code will vote for its neighbors and itself, and the more voted, the better candidates. In this way, search in hashing is not simply the collision between codes (i.e., query code and candidate code), but also the collision between neighbors (i.e., neighbors of candidate points). The underlying assumption is that the true neighbors of a query point should be close to each other, while points with similar binary codes but seldom be the neighbors of other candidate points would be false positives. We introduce a novel data structure called aggregated hash table for implementing our idea and accelerating the online search process. Experimental results show that our search scheme can significantly improve the search effectiveness while having good efficiency over different existing hashing techniques.