Abstract:Most cross-modal retrieval methods assume the multi-modal training data is complete and has a one-to-one correspondence. However, in the real world, multi-modal data generally suffers from missing modality information due to the uncertainty of data collection and storage processes, which limits the practical application of existing cross-modal retrieval methods. Although some solutions have been proposed to generate the missing modality data using a single pseudo sample, this may lead to incomplete semantic restoration and sub-optimal retrieval results due to the limited semantic information it provides. To address this challenge, this article proposes an Incomplete Cross-Modal Retrieval with Deep Correlation Transfer (ICMR-DCT) method that can robustly model incomplete multi-modal data and dynamically capture the adjacency semantic correlation for cross-modal retrieval. Specifically, we construct intra-modal graph attention-based auto-encoder to learn modality-invariant representations by performing semantic reconstruction through intra-modality adjacency correlation mining. Then, we design dual cross-modal alignment constraints to project multi-modal representations into a common semantic space, thus bridging the heterogeneous modality gap and enhancing the discriminability of the common representation. We further introduce semantic preservation to enhance adjacency semantic information and achieve cross-modal semantic correlation. Moreover, we propose a nearest-neighbor weighting integration strategy with cross-modal correlation transfer to generate the missing modality data according to inter-modality mapping relations and adjacency correlations between each sample and its neighbors, which improves the robustness of our method against incomplete multi-modal training data. Extensive experiments on three widely tested benchmark datasets demonstrate the superior performance of our method in cross-modal retrieval tasks under both complete and incomplete retrieval scenarios. Our used datasets and source codes are available at https://github.com/shidan0122/DCT.git .

Cross-Modality Retrieval by Joint Correlation Learning

Multiple Kernel Visual-Auditory Representation Learning for Retrieval

Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval

Cross‐modal Semantic Correlation Learning by Bi‐CNN Network

Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Learning Joint Embedding for Cross-Modal Retrieval

Cross-modal correlation learning with deep convolutional architecture

Combining Link And Content Correlation Learning For Cross-Modal Retrieval In Social Multimedia

Cross-modality Correlation Propagation for Cross-Media Retrieval

Analyzing semantic correlation for cross-modal retrieval

Learning Semantic Correlations for Cross-Media Retrieval.

Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval.

Cross-modal Retrieval Based on Deep Correlated Network

Cross-modal Image-Text Retrieval with Multitask Learning

Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval

Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval

CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network.

A New Approach to Cross-Modal Retrieval

Incomplete Cross-Modal Retrieval with Deep Correlation Transfer

Correspondence Autoencoders for Cross-Modal Retrieval