Abstract:Hashing has been widely applied to the large-scale approximate nearest neighbor search problem owing to its high efficiency and low storage requirement. Most investigations concentrate on learning hashing methods in a centralized setting. However, in existing big data systems, data is often stored across different nodes. In some situations, data is even collected in a distributed manner. A straightforward way to solve this problem is to aggregate all the data into the fusion center to obtain the search result (aggregating method). However, this strategy is not feasible because of the prohibitive communication cost. Although a few distributed hashing methods have been proposed to reduce this cost, they only focus on designing a distributed algorithm for a specific global optimization objective without considering scalability. Moreover, existing distributed hashing methods aim at finding a distributed solution to hashing, meanwhile avoiding accuracy loss, rather than improving accuracy. To address these challenges, we propose a Scalable Distributed Hashing (SDisH) model in which most existing hashing methods can be extended to process distributed data with no changes. Furthermore, to improve accuracy, we utilize the search radius as a global variable across different nodes to achieve a global optimum search result for every iteration. In addition, a voting algorithm is presented based on the results produced by multiple iterations to further reduce search errors. Theoretical analyses of communication, computation, and accuracy demonstrate the superiority of the proposed model. Numerical simulations on three large-scale and two relatively small benchmark datasets also show that the SDisH model achieves up to 44.75% and 10.23% accuracy gains compared to the aggregating method and state-of-the-art distributed hashing methods, respectively.

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

Learning to Hash for Indexing Big Data - A Survey

Complementary Hashing for Approximate Nearest Neighbor Search

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Weakly Principal Component Hashing with Multiple Tables.

Distributed Discrete Hashing by Equivalent Continuous Formulation.

Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion.

Sparse Matrix Based Hashing for Approximate Nearest Neighbor Search.

A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search

Neighborhood Voting: A Novel Search Scheme for Hashing.

Lazylsh: Approximate Nearest Neighbor Search For Multiple Distance Functions With A Single Index

A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing.

Preserving-Ignoring Transformation Based Index for Approximate k Nearest Neighbor Search

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

Scalable Distributed Hashing for Approximate Nearest Neighbor Search

A Survey on Deep Hashing Methods

Density Sensitive Hashing

Approximate Nearest Neighbor Based Feature Quantization Algorithm For Robust Hashing

Harmonious Hashing

Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement

Frequency Based Locality Sensitive Hashing