Efficient Indexing of Binary Lsh for High Dimensional Nearest Neighbor

Xiaoyu Zhang,Manlin Wang,Jiangtao Cui
DOI: https://doi.org/10.1016/j.neucom.2016.05.095
IF: 6
2016-01-01
Neurocomputing
Abstract:Approximate Nearest Neighbor search (ANN) is one of the most frequently used and yet expensive operations in the high-dimensional database, especially the multimedia database involving massive high dimensional feature vectors. Recently, Locality-Sensitive Hashing and its variants have been generally acknowledged as the promising solutions to ANN search due to its excellent performance and ease of implementation. However, the existing LSH methods for external memory usually create a linear order relationship on the set of compound hash keys to rearrange the data set in the external storage, which incurs many false negatives.In this paper, we propose a surprising simple method to solve the ANN problem with high accuracy results and requiring only a limited number of random I/O. For the distance-preserving properties of LSH functions, the idea of collision counting is used to guarantee the accuracy of the returned neighbours. We convert the dynamic collision counting problem into the nearest neighbor search (NN) in Hamming space. To support the NN search in the hamming space, we establish the multi-index hashing structure to rearrange the binary hash keys and their corresponding objectives. Accessing the candidates consumes a limited number of random I/O and shorter response time. Experimental results show that our method can achieve higher accuracy of ANN results compared with the state-of-the-art methods including LSB, C2LSH and SK-LSH. (C) 2016 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?