Abstract:Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are "consistent" and comparable. nThe transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal's hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art.

Linear cross-modal hashing for efficient multimedia search

Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval

Frustratingly Easy Cross-Modal Hashing

Online latent semantic hashing for cross-media retrieval.

Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data

Semantic Consistency Hashing for Cross-Modal Retrieval

Discrete Similarity Preserving Hashing for Cross-modal Retrieval.

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

Unsupervised Multi-modal Hashing for Cross-Modal Retrieval

Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval

Learning a Cross-Modal Hashing Network for Multimedia Search.

Fast Discrete Collaborative Multi-Modal Hashing for Large-Scale Multimedia Retrieval

Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval

Deep Cross-Modal Hashing with Fine-Grained Similarity

Multi-modal Hashing for Efficient Multimedia Retrieval: A Survey

Cross-Modal Discrete Hashing.

Transitive Hashing Network for Heterogeneous Multimedia Retrieval

MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval

Robust and discrete matrix factorization hashing for cross-modal retrieval

Latent semantic-enhanced discrete hashing for cross-modal retrieval

Discriminative Coupled Dictionary Hashing for Fast Cross-Media Retrieval