Abstract:Nowadays, it is always time consuming and inefficient to process large amounts of data with the arrival of the big-data age. Locality sensitive hashing(LSH) has proved to be one of the most effective method for scalable high dimensional nearest neighbor search. (Key, Value) based distributed frameworks, such as MapReduce, Memcached, and Twitter Storm are gaining increasingly widespread use in practical application. For example, locality sensitive hashing(LSH) is used in large scale similarity search, and it also becomes an important method to handle data in high dimensions. However, in order to ensure the efficiency, LSH scheme needs a rather large number of hash tables, which entails a large storage requirement. In the distributed setting, this also entails a big network load with each query requiring a network call per hash bucket look up. In the meaning while, to decrease the space and for the purpose of clarity, some methods use few hash tables and choose randomly hash functions, then it is hard to achieve high efficiency. To overcome the above problems, this paper applies learning based algorithm in the background of (key, value) based distributed frameworks and uses it to process NN queries with MapReduce. We propose a novel access method, called LB-LSH, which attempts to add learning algorithm to the basic architecture of Layered LSH to balance the load leveling in distributed setting and ensure the high efficiency in the case of O(1) hash table. Finally, we present experimental results with LB-LSH on Hadoop, and the proposed method can achieve better performance compared to the state-of-the-art hashing approaches.

Dsh: Data Sensitive Hashing For High-Dimensional K-Nn Search

Density Sensitive Hashing

DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing

Distribution-Aware Locality Sensitive Hashing

Preserving-Ignoring Transformation Based Index for Approximate k Nearest Neighbor Search

Data-oriented locality sensitive hashing.

Efficient Locality-Sensitive Hashing over High-Dimensional Data Streams

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Learning-based distributed locality sensitive hashing.

Lazylsh: Approximate Nearest Neighbor Search For Multiple Distance Functions With A Single Index

A Robust Method Based on Locality Sensitive Hashing for K-nearest Neighbors Searching

Efficient Locality-Sensitive Hashing over High-Dimensional Streaming Data.

Distributed Online Similarity Search in High Dimensional Space

Data-Dependent Locality Sensitive Hashing

Bi-Level Locality Sensitive Hashing Index Based on Clustering

Frequency Based Locality Sensitive Hashing

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

Locality-Sensitive Hashing for Finding Nearest Neighbors in Probability Distributions.

Improving Similarity Search with High-dimensional Locality-sensitive Hashing

In Defense of Locality-Sensitive Hashing

DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search