Abstract:Inspired by the powerful representation capability of deep neural networks, deep cross-modal hashing methods have recently drawn much attention and various deep cross-modal hashing methods have been developed. However, two key problems have not been solved well yet: 1) With advanced neural network models, how to seek the multi-modal alignment space which can effectively model the intrinsic multi-modal correlations and reduce the heterogeneous modality gaps. 2) How to effectively and efficiently preserve the modelled multi-modal semantic correlations into the binary hash codes under the deep learning paradigm. In this paper, we propose a Hierarchical Message Aggregation Hashing (HMAH) method within an efficient teacher-student learning framework. Specifically, on the teacher end, we develop hierarchical message aggregation networks to construct a multi-modal complementary space by aggregating the semantic messages hierarchically across different modalities, which can better align the heterogeneous modalities and model the fine-grained multi-modal correlations. On the student end, we train a couple of student modules that learn hash functions to support cross-modal retrieval. We design a cross-modal correlation knowledge distillation strategy which seamlessly transfers the modelled fine-grained multi-modal semantic correlations from the teacher to the lightweight student modules. With the fine-grained knowledge supervision from teacher module, the semantic representation capability of hash functions can be enhanced. In addition, the whole learning framework avoids the time-consuming finetuning on the pre-trained deep models as existing methods and it is computationally efficient. Experimental results demonstrate the significant performance improvement of the proposed method on both retrieval accuracy and efficiency, compared with the state-of-the-art deep cross-modal hashing methods. The source codes of our method are available at: https://github.com/FutureTwT/HMAH.

Unsupervised graph reasoning distillation hashing for multimodal hamming space search with vision-language model

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval

Unsupervised Multi-modal Hashing for Cross-Modal Retrieval

Comprehensive Graph-conditional Similarity Preserving Network for Unsupervised Cross-modal Hashing

Unsupervised Deep Hashing Via Binary Latent Factor Models for Large-scale Cross-modal Retrieval

Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation

Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval

Large-Scale Cross-Modal Hashing with Unified Learning and Multi-Object Regional Correlation Reasoning

Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing

Collective Matrix Factorization Hashing for Multimodal Data

Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval

Teacher-Student Learning: Efficient Hierarchical Message Aggregation Hashing for Cross-Modal Retrieval

Discrete Two-Step Cross-Modal Hashing Through the Exploitation of Pairwise Relations

Multi-Relational Deep Hashing for Cross-Modal Search

Efficient Semi-Supervised Multimodal Hashing With Importance Differentiation Regression

CKDH: CLIP-based Knowledge Distillation Hashing for Cross-modal Retrieval

Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval

Deep Multimodal Hashing with Orthogonal Regularization

Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing