Abstract:Cross-modal retrieval tasks, which are more natural and challenging than traditional retrieval tasks, have attracted increasing interest from researchers in recent years. Although different modalities with the same semantics have some potential relevance, the feature space heterogeneity still seriously weakens the performance of cross-modal retrieval models. To solve this problem, common space-based methods in which multimodal data is projected into a learned common space for similarity measurement have become the mainstream approach for cross-modal retrieval tasks. However, current methods entangle the modality style and semantic content in the common space and neglect to fully explore the semantic and discriminative representation/reconstruction of the semantic content. This often results in an unsatisfactory retrieval performance. To solve these issues, this paper proposes a new Deep Supervised Dual Cycle Adversarial Network (DSDCAN) model based on common space learning. It is composed of two cross-modal cycle GANs, one for the image and one for the text. The proposed cycle GAN model disentangles the semantic content and modality style features by making the data of one modality well reconstructed from the extracted modal style feature and the content feature of the other modality. Then, a discriminative semantic and label loss is proposed by fully considering the category, sample contrast, and label supervision to enhance the semantic discrimination of the common space representation. Besides this, to make the data distribution between two modalities similar, a second-order similarity is presented as a distance measurement of the cross-modal representation in the common space. Extensive experiments have been conducted on the Wikipedia, Pascal Sentence, NUS-WIDE-10k, PKU XMedia, MSCOCO, NUS-WIDE, Flickr30k and MIRFlickr datasets. The results demonstrate that the proposed method can achieve a higher performance than the state-of-the-art methods.

Adversarial Learning For Cross-Modal Retrieval With Wasserstein Distance

X-Gacmn: An X-Shaped Generative Adversarial Cross-Modal Network With Hypersphere Embedding

Category Alignment Adversarial Learning for Cross-modal Retrieval

Adversarial Cross-Modal Retrieval

Deep Joint Two-Stream Wasserstein Auto-Encoder and Selective Attention Alignment for Unsupervised Domain Adaptation

Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval

Dual discriminant adversarial cross-modal retrieval

Deep Attentional Fine-Grained Similarity Network with Adversarial Learning for Cross-Modal Retrieval

Shared wasserstein adversarial domain adaption

Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval

Integrating information theory and adversarial learning for cross-modal retrieval

Adversarial pre-optimized graph representation learning with double-order sampling for cross-modal retrieval

Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval

Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Multicenter clinical trial of implanted norethindrone pellets for long-acting contraception in women. Program for Applied Research on Fertility Regulation.

Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment

Multimodal Adversarially Learned Inference with Factorized Discriminators

Wasserstein Distance Guided Representation Learning for Domain Adaptation

Discriminative Dictionary Learning with Common Label Alignment for Cross-Modal Retrieval.