Abstract:Exploiting relationship among samples in cross-modal data plays a key role in the task of cross-modal retrieval, but most of existing methods only extract the correlation from pairwise samples and ignore the relations of unpaired samples. Some graph regularization methods proposed a reasonable paradigm to exploit the correlation from multiple samples. However, limited by the traditional framework, the performance has much room to improve. Moreover, although some existing DNN-based methods achieve excellent performance, the requirement of massive labeled data is also a shortcoming. In this paper, we propose a novel semi-supervised method, named Semi-supervised Constrained Graph Convolutional Network (SCGCN), which adopts graph convolutional network to exploit correlation from batch samples of data with different modalities. For reducing the requirement of labeled data, we design a two stage training procedure: deep supervised learning stage and unsupervised learning stage. In deep supervised learning stage, we integrate two DNN-based semantic encoding networks and a shared classifier into Deep Cross-modal Semantic Encoding (DCSE) module which is trained by supervised learning with labeled data. From DCSE module, we learn a temporary modality-invariant space where the semantic embeddings of samples with different modalities are modality-invariant, and we also learn a classifier which can generate predicted label from the unlabeled data. In unsupervised learning stage, for fully exploiting the correlation from cross-modal data, we design a Constrained Graph Convolutional Network (CGCN) module which utilizes GCN to exploit the correlation and adopts both intra-modal discriminative loss and inter-modal pairwise similar loss to ensure the generated common representation modality-invariant and semantical discriminative. We perform extensive experiments on four conventional datasets and a large scale dataset to demonstrate the effectiveness of proposed approach.

Cross-Modal Retrieval with Discriminative Dual-Path CNN

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Dual discriminant adversarial cross-modal retrieval

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval

Dual-View Curricular Optimal Transport for Cross-Lingual Cross-Modal Retrieval

Cross‐modal retrieval with dual multi‐angle self‐attention

Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval

Multicenter clinical trial of implanted norethindrone pellets for long-acting contraception in women. Program for Applied Research on Fertility Regulation.

Deep Supervised Cross-Modal Retrieval

Cross-modal Image-Text Retrieval with Multitask Learning

Semantic-enhanced discriminative embedding learning for cross-modal retrieval

Semi-supervised constrained graph convolutional network for cross-modal retrieval

Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

Cross-modal Common Representation Learning by Hybrid Transfer Network

Modality-dependent Cross-media Retrieval

Cross-modal retrieval by an end to end way

Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval

Discriminative Dictionary Learning with Common Label Alignment for Cross-Modal Retrieval.

Dual-path Convolutional Image-Text Embeddings with Instance Loss