Abstract:Cross-modal retrieval has drawn wide interest for retrieval across different modalities (such as text, image, video, audio, and 3-D model). However, existing methods based on a deep neural network often face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is usually adopted for relieving the problem of insufficient training data, but it mainly focuses on knowledge transfer only from large-scale datasets as a single-modal source domain (such as ImageNet) to a single-modal target domain. In fact, such large-scale single-modal datasets also contain rich modal-independent semantic knowledge that can be shared across different modalities. Besides, large-scale cross-modal datasets are very labor-consuming to collect and label, so it is significant to fully exploit the knowledge in single-modal datasets for boosting cross-modal retrieval. To achieve the above goal, this paper proposes a modal-adversarial hybrid transfer network (MHTN), which aims to realize knowledge transfer from a single-modal source domain to a cross-modal target domain and learn cross-modal common representation. It is an end-to-end architecture with two subnetworks. First, a modal-sharing knowledge transfer subnetwork is proposed to jointly transfer knowledge from a single modality in the source domain to all modalities in the target domain with a star network structure, which distills modal-independent supplementary knowledge for promoting cross-modal common representation learning. Second, a modal-adversarial semantic learning subnetwork is proposed to construct an adversarial training mechanism between the common representation generator and modality discriminator, making the common representation discriminative for semantics but indiscriminative for modalities to enhance cross-modal semantic consistency during the transfer process. Comprehensive experiments on four widely used datasets show the effectiveness of MHTN.

Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval

X-Gacmn: An X-Shaped Generative Adversarial Cross-Modal Network With Hypersphere Embedding

MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval

Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities

Cross-Modal Search for Social Networks via Adversarial Learning

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Multicenter clinical trial of implanted norethindrone pellets for long-acting contraception in women. Program for Applied Research on Fertility Regulation.

Adversarial Cross-Modal Retrieval

Deep Attentional Fine-Grained Similarity Network with Adversarial Learning for Cross-Modal Retrieval

Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval

Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval

Adversarial Graph Convolutional Network for Cross-Modal Retrieval

Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval.

Dual graph-structured semantics multi-subspace learning for cross-modal retrieval

Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment

Deep Multi-Graph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval

Semantic enhancement and multi-level alignment network for cross-modal retrieval

Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval

Weighted Graph-structured Semantics Constraint Network for Cross-Modal Retrieval

Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network