Abstract:Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are "consistent" and comparable. nThe transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal's hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art.

Cross-Media Hashing with Neural Networks

Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval

Online latent semantic hashing for cross-media retrieval.

Supervised Coarse-to-Fine Semantic Hashing for Cross-Media Retrieval.

Cross-media hashing with kernel regression

Discriminative Coupled Dictionary Hashing for Fast Cross-Media Retrieval

Specific class center guided deep hashing for cross-modal retrieval

Unsupervised Multi-modal Hashing for Cross-Modal Retrieval

Joint Image-Text Hashing for Fast Large-Scale Cross-Media Retrieval Using Self-Supervised Deep Learning.

Discrete Semantic Alignment Hashing for Cross-Media Retrieval

Scalable Multimedia Retrieval By Deep Learning Hashing With Relative Similarity Learning

Learning a Cross-Modal Hashing Network for Multimedia Search.

Cross-modal retrieval based on multi-dimensional feature fusion hashing

MOON: Multi-hash codes joint learning for cross-media retrieval

Transitive Hashing Network for Heterogeneous Multimedia Retrieval

Linear cross-modal hashing for efficient multimedia search

Robust and discrete matrix factorization hashing for cross-modal retrieval

Deep Cross-Modal Hashing with Fine-Grained Similarity

Triplet-Based Deep Hashing Network for Cross-Modal Retrieval

A Mixed Generative-Discriminative Based Hashing Method

Multi-modal Hashing for Efficient Multimedia Retrieval: A Survey