Data-Dependent Locality Sensitive Hashing

Hongtao Xie,Zhineng Chen,Yizhi Liu,Jianlong Tan,Li Guo
DOI: https://doi.org/10.1007/978-3-319-13168-9_32
2014-01-01
Abstract:Locality sensitive hashing LSH is the most popular algorithm for approximate nearest neighbor ANN search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited performance. In this paper, we propose a new data-dependent LSH algorithm, which has two-level structures to perform ANN search in high dimensional spaces. In the first level, we first train a number of cluster centers, then use the cluster centers to divide the dataset into many clusters and the vectors in each cluster has near uniform distribution. In the second level, we construct LSH tables for each cluster. Given a query, we first determine a few clusters that it belongs to with high probability, and then perform ANN search in the corresponding LSH tables. Experimental results on the reference datasets show that the search speed can be increased by 48 times compared to E2LSH, while keeping high search precision.
What problem does this paper attempt to address?