Frequency Based Locality Sensitive Hashing

Kang Ling,Gangshan Wu
DOI: https://doi.org/10.1109/ICMT.2011.6002015
2011-01-01
Multimedia Technology
Abstract:Nearest Neighbor (NN) search is of major importance to many applications, such as information retrieval, data mining and so on. However, finding the NN in high dimensional space has been proved to be time-consuming. In recent years, Locality Sensitive Hashing (LSH) has been proposed to solve Approximate Nearest Neighbor (ANN) problem. The main drawback of LSH is that it requires quite a lot of memory to achieve good performance, which makes it not that suit for today's application of massive data. We analyze generic LSH scheme as well as the properties of LSH hash functions based on p-stable distributions and propose a new LSH scheme called Frequency Based Locality Sensitive Hashing (FBLSH). FBLSH just uses one function based on p-stable distributions as hash function of a hash table, and it sets a frequency threshold m, only those points which collide with query point more than m times can be candidate ANNs. FBLSH is easy to implement and through experiments, we show that FBLSH can reduce the extra space cost by several orders of magnitude with less (or similar) time cost while achieving better search quality compared with LSH based onp-stable distributions.
What problem does this paper attempt to address?