Abstract:Recently, numerous unsupervised cross-modal hashing methods have been proposed to deal the image-text retrieval tasks for the unlabeled cross-modal data. However, when these methods learn to generate hash codes, almost all of them lack modality-interaction in the following two aspects: (1) The instance similarity matrix used to guide the hashing networks training is constructed without image-text interaction, which fails to capture the fine-grained cross-modal cues to elaborately characterize the intrinsic semantic similarity among the datapoints. (2) The binary codes used for quantization loss are inferior because they are generated by directly quantizing a simple combination of continuous hash codes from different modalities without the interaction among these continuous hash codes. Such problems will cause the generated hash codes to be of poor quality and degrade the retrieval performance. Hence, in this paper, we propose a novel Unsupervised Cross-modal Hashing with Modality-interaction, termed UCHM. Specifically, by optimizing a novel hash-similarity-friendly loss, a modality-interaction-enabled (MIE) similarity generator is first trained to generate a superior MIE similarity matrix for the training set. Then, the generated MIE similarity matrix is utilized as guiding information to train the deep hashing networks. Furthermore, during the process of training the hashing networks, a novel bit-selection module is proposed to generate high-quality unified binary codes for the quantization loss with the interaction among continuous codes from different modalities, thereby further enhancing the retrieval performance. Extensive experiments on two widely used datasets show that the proposed UCHM outperforms state-of-the-art techniques on cross-modal retrieval tasks.

Text-assisted attention-based cross-modal hashing

Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval

Frustratingly Easy Cross-Modal Hashing

Semantic Consistency Hashing for Cross-Modal Retrieval

Supervised Coarse-to-Fine Semantic Hashing for Cross-Media Retrieval.

Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval

Unsupervised Cross-modal Hashing via Semantic Text Mining

Unsupervised Multi-modal Hashing for Cross-Modal Retrieval

Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval

Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval

Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval

Unsupervised Cross-modal Hashing with Modality-interaction

Task-adaptive Asymmetric Deep Cross-modal Hashing

Discrete Semantic Alignment Hashing for Cross-Media Retrieval

Cross-modal image–text search via Efficient Discrete Class Alignment Hashing

Label-wise Deep Semantic-Alignment Hashing for Cross-Modal Retrieval.

Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval