Abstract:Cross-modal retrieval (i.e., image–query–text or text–query–image) is a hot research topic for multimedia information retrieval, but the heterogeneity gap between different modalities generates a critical challenge for multimodal data. Some researchers regard the cross-modal retrieval as a leaning to rank task, and they usually consider to measure similarity between two different modalities in the embedding shared subspace. However, previous methods almost pay more attention to construct a discriminative objective function to optimize common space, ignoring to exploit correlation between the single modality. In this paper, we consider the cross-modal retrieval task, from the perspective of optimizing ranking model, as a listwise ranking problem, and propose a novel method called learning to rank with relational graph and pointwise constraint (\( {\text{LR}}^{2} {\text{GP}} \)). In \( {\text{LR}}^{2} {\text{GP}} \), we first propose a discriminative ranking model, which makes use of the relation between the single modality to improve ranking performance so as to learn an optimal embedding common subspace. Then, a pointwise constraint is introduced in the low-dimension embedding subspace to make up for the real loss in the training phase since listwise method introduced merely considers directly optimize latent permutation from the perspective of the overall. Finally, a dynamic interpolation algorithm, which gradually transits from pointwise and pairwise to listwise learning, is selected to deal with the problem of fusion of loss function reasonable. Experiments on the benchmark datasets about Wikipedia and Pascal demonstrate the effectiveness for proposed method.

Cross-Modal Retrieval by Class Information and Listwise Ranking

Cross-Modal Learning to Rank with Adaptive Listwise Constraint

Learning to Rank with Relational Graph and Pointwise Constraint for Cross-Modal Retrieval

Cross-modal Retrieval with Dual Optimization

Learning Multimodal Neural Network with Ranking Examples

Deep Pairwise Ranking with Multi-label Information for Cross-Modal Retrieval.

Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning.

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

Bi-CMR: Bidirectional Reinforcement Guided Hashing for Effective Cross-Modal Retrieval

Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities

PL-ranking

Coordinated and Specific Restricted Boltzmann Machine for Cross-Modal Retrieval

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment

Cross Modal Retrieval Algorithm Based on Iterative Queries

Cross-Modal Learning With Images, Texts And Their Semantics

Cross-modal Deep Metric Learning with Multi-Task Regularization

A Comprehensive Survey on Cross-modal Retrieval.

Semi-supervised Cross-Modal Learning for Cross Modal Retrieval and Image Annotation.

Cross-Modal Learning to Rank Via Latent Joint Representation

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval