Abstract:Deep supervised hashing techniques have exhibited remarkable efficiency in cross-modal retrieval tasks, because they enable the transformation of data from different modalities into compact binary codes that preserve semantic similarity structures. Nonetheless, existing methods often rely on pairwise or triplet relationships within known (or in-distribution) semantics during training, failing to capture the comprehensive ranking information inherent in web data that encompasses diverse concepts. In addition, these methods are vulnerable to out-of-distribution (OOD) semantic data when applied in realistic scenarios, resulting in suboptimal performance. In this paper, we propose ranking distribution preserving hashing (RDPH) to address these problems. We present a novel ranking loss, a differentiable surrogate that maximizes the NDCG metric for cross-modal retrieval. This loss incorporates two target ranking distributions derived from the ideal NDCG scores of samples and the cosine similarity of features. These distributions encourage RDPH to generate hash codes that approximate the desired inter-modal and intra-modal ranking distributions. To enhance the robustness of the hash codes against OOD data, RDPH leverages the CLIP paradigm to acquire OOD-resilient intermediate representations. Besides, we utilize the outlier exposure strategy to enhance the discriminative ability of OOD for hash codes under supervision by constructing auxiliary pseudo-OOD data from known data in feature space. Experiments on three datasets demonstrate that the proposed method achieves state-ofthe-art performance on regular retrieval tasks and good results on simulated real-world retrieval tasks.

Deep Pairwise Ranking with Multi-label Information for Cross-Modal Retrieval.

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Cross-modal Deep Metric Learning with Multi-Task Regularization

Learning to Rank with Relational Graph and Pointwise Constraint for Cross-Modal Retrieval

PL-ranking

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval

Deep Supervised Cross-Modal Retrieval

Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning.

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment

Cross-Modal Retrieval by Class Information and Listwise Ranking

Ranking-Based Deep Cross-Modal Hashing

Cross-Modal Learning to Rank with Adaptive Listwise Constraint

Scalable Deep Multimodal Learning for Cross-Modal Retrieval

Cross-Modal Learning to Rank Via Latent Joint Representation

Deep Ranking Distribution Preserving Hashing for Robust Multi-Label Cross-modal Retrieval

Weakly-paired Deep Dictionary Learning for Cross-Modal Retrieval

Cross-modal Retrieval with Dual Optimization

Deep Cross-Modal Projection Learning For Image-Text Matching

Effective Deep Learning-Based Multi-Modal Retrieval