Abstract:Cross-modal retrieval tasks, which are more natural and challenging than traditional retrieval tasks, have attracted increasing interest from researchers in recent years. Although different modalities with the same semantics have some potential relevance, the feature space heterogeneity still seriously weakens the performance of cross-modal retrieval models. To solve this problem, common space-based methods in which multimodal data is projected into a learned common space for similarity measurement have become the mainstream approach for cross-modal retrieval tasks. However, current methods entangle the modality style and semantic content in the common space and neglect to fully explore the semantic and discriminative representation/reconstruction of the semantic content. This often results in an unsatisfactory retrieval performance. To solve these issues, this paper proposes a new Deep Supervised Dual Cycle Adversarial Network (DSDCAN) model based on common space learning. It is composed of two cross-modal cycle GANs, one for the image and one for the text. The proposed cycle GAN model disentangles the semantic content and modality style features by making the data of one modality well reconstructed from the extracted modal style feature and the content feature of the other modality. Then, a discriminative semantic and label loss is proposed by fully considering the category, sample contrast, and label supervision to enhance the semantic discrimination of the common space representation. Besides this, to make the data distribution between two modalities similar, a second-order similarity is presented as a distance measurement of the cross-modal representation in the common space. Extensive experiments have been conducted on the Wikipedia, Pascal Sentence, NUS-WIDE-10k, PKU XMedia, MSCOCO, NUS-WIDE, Flickr30k and MIRFlickr datasets. The results demonstrate that the proposed method can achieve a higher performance than the state-of-the-art methods.

Dual graph-structured semantics multi-subspace learning for cross-modal retrieval

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

X-Gacmn: An X-Shaped Generative Adversarial Cross-Modal Network With Hypersphere Embedding

Semantic Consistency Hashing for Cross-Modal Retrieval

Deep Multi-Graph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval

Multiple Kernel Visual-Auditory Representation Learning for Retrieval

Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Weighted Graph-structured Semantics Constraint Network for Cross-Modal Retrieval

Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval.

Multicenter clinical trial of implanted norethindrone pellets for long-acting contraception in women. Program for Applied Research on Fertility Regulation.

Effective Deep Learning-Based Multi-Modal Retrieval

Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval

Learning Discriminative Representations for Semantic Cross Media Retrieval

Graph Embedding Learning for Cross-Modal Information Retrieval.

Cross‐modal retrieval with dual multi‐angle self‐attention

Semantics Disentangling for Cross-Modal Retrieval

Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective

Adversarial Graph Convolutional Network for Cross-Modal Retrieval

Federated learning for supervised cross-modal retrieval

Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval