Locality-Sensitive Hashing for Finding Nearest Neighbors in Probability Distributions.

Yi-Kun Tang,Xianling Mao,Yi-Jing Hao,Cheng Xu,Heyan Huang
DOI: https://doi.org/10.1007/978-981-10-6805-8_1
2017-01-01
Abstract:In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Approximate Nearest Neighbors search (ANN). To find the nearest neighbors in probability-distribution-type data, the existing Locality Sensitive Hashing (LSH) algorithms for vector-type data can be directly used to solve it. However, these methods do not consider the special properties of probability distributions. In this paper, based on the special properties of probability distributions, we present a novel LSH scheme adapted to angular distance for ANN search in high-dimensional probability distributions. We define the specific hashing functions, and prove their localsensitivity. Also, we propose a Sequential Interleaving algorithm based on the "Unbalance Effect" of Euclidean and angular metrics for probability distributions. Finally, we compare, through experiments, our methods with the state-of-the-art LSH algorithms in the context of ANN on six public image databases. The results prove the proposed algorithms can provide far better accuracy in the context of ANN than baselines.
What problem does this paper attempt to address?