Abstract:Hashing has been widely applied to the large-scale approximate nearest neighbor search problem owing to its high efficiency and low storage requirement. Most investigations concentrate on learning hashing methods in a centralized setting. However, in existing big data systems, data is often stored across different nodes. In some situations, data is even collected in a distributed manner. A straightforward way to solve this problem is to aggregate all the data into the fusion center to obtain the search result (aggregating method). However, this strategy is not feasible because of the prohibitive communication cost. Although a few distributed hashing methods have been proposed to reduce this cost, they only focus on designing a distributed algorithm for a specific global optimization objective without considering scalability. Moreover, existing distributed hashing methods aim at finding a distributed solution to hashing, meanwhile avoiding accuracy loss, rather than improving accuracy. To address these challenges, we propose a Scalable Distributed Hashing (SDisH) model in which most existing hashing methods can be extended to process distributed data with no changes. Furthermore, to improve accuracy, we utilize the search radius as a global variable across different nodes to achieve a global optimum search result for every iteration. In addition, a voting algorithm is presented based on the results produced by multiple iterations to further reduce search errors. Theoretical analyses of communication, computation, and accuracy demonstrate the superiority of the proposed model. Numerical simulations on three large-scale and two relatively small benchmark datasets also show that the SDisH model achieves up to 44.75% and 10.23% accuracy gains compared to the aggregating method and state-of-the-art distributed hashing methods, respectively.

Semi-randomized Hashing for Large Scale Data Retrieval

Large-scale Image Retrieval Based on Boosting Iterative Quantization Hashing with Query-Adaptive Reranking.

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

Supervised Coarse-to-Fine Semantic Hashing for Cross-Media Retrieval.

Data driven multi-index hashing

Hash Learning with Variable Quantization for Large-scale Retrieval

Density Sensitive Hashing

Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion.

Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search.

Robust Discrete Spectral Hashing for Large-Scale Image Semantic Indexing

Learning to Hash for Indexing Big Data - A Survey

Harmonious Hashing

Kernel-Based Supervised Discrete Hashing For Image Retrieval

Scalable Distributed Hashing for Approximate Nearest Neighbor Search

Double-Bit Quantization and Index Hashing for Nearest Neighbor Search

Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search

SSDH: Semi-Supervised Deep Hashing for Large Scale Image Retrieval

Large-scale Image Retrieval with Supervised Sparse Hashing

Online Matrix Factorization Hashing for Large-Scale Image Retrieval

Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval

Learning Flexible Binary Code for Linear Projection Based Hashing with Random Forest