Dsh: Data Sensitive Hashing For High-Dimensional K-Nn Search

Jinyang Gao,H. Jagadish,Wei Lu,Beng Chin Ooi
DOI: https://doi.org/10.1145/2588555.2588565
2014-01-01
Abstract:The need to locate the k-nearest data points with respect to a given query point in a multi- and high-dimensional space is common in many applications. Therefore, it is essential to provide efficient support for such a search. Locality Sensitive Hashing (LSH) has been widely accepted as an effective hash method for high-dimensional similarity search. However, data sets are typically not distributed uniformly over the space, and as a result, the buckets of LSH are unbalanced, causing the performance of LSH to degrade.In this paper, we propose a new and efficient method called Data Sensitive Hashing (DSH) to address this drawback. DSH improves the hashing functions and hashing family, and is orthogonal to most of the recent state-of-the-art approaches which mainly focus on indexing and querying strategies. DSH leverages data distributions and is capable of directly preserving the nearest neighbor relations. We show the theoretical guarantee of DSH, and demonstrate its efficiency experimentally.
What problem does this paper attempt to address?