Learning-based distributed locality sensitive hashing.
Jia Shi,Zhaobin Liu,Zhiyang Li,Chang Liu,Wenyu Qu
IF: 4.397
2017-01-01
Computer Systems Science and Engineering
Abstract:Nowadays, it is always time consuming and inefficient to process large amounts of data with the arrival of the big-data age. Locality sensitive hashing(LSH) has proved to be one of the most effective method for scalable high dimensional nearest neighbor search. (Key, Value) based distributed frameworks, such as MapReduce, Memcached, and Twitter Storm are gaining increasingly widespread use in practical application. For example, locality sensitive hashing(LSH) is used in large scale similarity search, and it also becomes an important method to handle data in high dimensions. However, in order to ensure the efficiency, LSH scheme needs a rather large number of hash tables, which entails a large storage requirement. In the distributed setting, this also entails a big network load with each query requiring a network call per hash bucket look up. In the meaning while, to decrease the space and for the purpose of clarity, some methods use few hash tables and choose randomly hash functions, then it is hard to achieve high efficiency. To overcome the above problems, this paper applies learning based algorithm in the background of (key, value) based distributed frameworks and uses it to process NN queries with MapReduce. We propose a novel access method, called LB-LSH, which attempts to add learning algorithm to the basic architecture of Layered LSH to balance the load leveling in distributed setting and ensure the high efficiency in the case of O(1) hash table. Finally, we present experimental results with LB-LSH on Hadoop, and the proposed method can achieve better performance compared to the state-of-the-art hashing approaches.