CLSH: Cluster-based Locality-Sensitive Hashing

Xiangyang Xu,Tongwei Ren,Gangshan Wu
DOI: https://doi.org/10.1145/2632856.2632868
2014-01-01
Abstract:Locality-sensitive hashing (LSH) usually consumes large memory in similarity search, which limits its scalability for large scale applications. In this paper, we propose a novel cluster-based locality-sensitive hashing (CLSH) approach, which extends the conventional LSH framework and aims at indexing and searching large scale high-dimensional datasets. We first utilize a clustering algorithm to partition the raw feature dataset into clusters, and map these clusters to a distributed cluster. Then, LSH method is applied to construct the index for each cluster, and we present two criteria to choose the cluster(s) for further detailed search in order to improve the search quality. This proposed framework comes with following properties. Firstly, CLSH can cope with large scale feature dataset. Secondly, the generated clusters can guide the feature dataset automatical mappings to a distributed cluster. After that, the search time can be reduced a lot by searching on multiple computing nodes. Experiments show that the proposed approach outperforms the existing approaches in terms of efficiency and scalability.
What problem does this paper attempt to address?