Abstract:Multi-modal hashing focuses on fusing different modalities and exploring the complementarity of heterogeneous multi-modal data for compact hash learning. However, existing multi-modal hashing methods still suffer from several problems, including: 1) Almost all existing methods generate unexplainable hash codes. They roughly assume that the contribution of each hash code bit to the retrieval results is the same, ignoring the discriminative information embedded in hash learning and semantic similarity in hash retrieval. Moreover, the length of hash code is empirically set, which will cause bit redundancy and affect retrieval accuracy. 2) Most existing methods exploit shallow models which fail to fully capture higher-level correlation of multi-modal data. 3) Most existing methods adopt online hashing strategy based on immutable direct projection, which generates query codes for new samples without considering the differences of semantic categories. In this paper, we propose a Semantic-driven Interpretable Deep Multi-modal Hashing (SIDMH) method to generate interpretable hash codes driven by semantic categories within a deep hashing architecture, which can solve all these three problems in an integrated model. The main contributions are: 1) A novel deep multi-modal hashing network is developed to progressively extract hidden representations of heterogeneous modality features and deeply exploit the complementarity of multi-modal data. 2) Learning interpretable hash codes, with discriminant information of different categories distinctively embedded into hash codes and their different impacts on hash retrieval intuitively explained. Besides, the code length depends on the number of categories in the dataset, which can reduce the bit redundancy and improve the retrieval accuracy. 3) The semantic-driven online hashing strategy encodes the significant branches and discards the negligible branches of each query sample according to the semantics contained in it, therefore it could capture different semantics in dynamic queries. Finally, we consider both the nearest neighbor similarity and semantic similarity of hash codes. Experiments on several public multimedia retrieval datasets validate the superiority of the proposed method.

Multimedia Retrieval by Deep Hashing with Multilevel Similarity Learning.

Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval

Deep hashing with multilevel similarity learning for multimedia similarity search.

Scalable Multimedia Retrieval By Deep Learning Hashing With Relative Similarity Learning

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals

Semantic-Driven Interpretable Deep Multi-Modal Hashing for Large-Scale Multimedia Retrieval

Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval

Fast Discrete Collaborative Multi-Modal Hashing for Large-Scale Multimedia Retrieval

Deep Self-Supervised Hashing With Fine-Grained Similarity Mining for Cross-Modal Retrieval

Deep Multi-Similarity Hashing Via Label-Guided Network for Cross-Modal Retrieval

Multi-modal Hashing for Efficient Multimedia Retrieval: A Survey

Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search

Multi-task Learning for Deep Semantic Hashing

Deep Hashing Network for Efficient Similarity Retrieval

Joint Image-Text Hashing for Fast Large-Scale Cross-Media Retrieval Using Self-Supervised Deep Learning.

A Novel Cross Modal Hashing Algorithm Based on Multi-modal Deep Learning

Deep Multiscale Fusion Hashing for Cross-Modal Retrieval

Deep Multilevel Similarity Hashing with Fine-Grained Features for Multi-Label Image Retrieval

Deep-Like Hashing-in-Hash for Visual Retrieval: an Embarrassingly Simple Method.

Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval

Flexible Multi-modal Hashing for Scalable Multimedia Retrieval