Abstract:Exploiting relationship among samples in cross-modal data plays a key role in the task of cross-modal retrieval, but most of existing methods only extract the correlation from pairwise samples and ignore the relations of unpaired samples. Some graph regularization methods proposed a reasonable paradigm to exploit the correlation from multiple samples. However, limited by the traditional framework, the performance has much room to improve. Moreover, although some existing DNN-based methods achieve excellent performance, the requirement of massive labeled data is also a shortcoming. In this paper, we propose a novel semi-supervised method, named Semi-supervised Constrained Graph Convolutional Network (SCGCN), which adopts graph convolutional network to exploit correlation from batch samples of data with different modalities. For reducing the requirement of labeled data, we design a two stage training procedure: deep supervised learning stage and unsupervised learning stage. In deep supervised learning stage, we integrate two DNN-based semantic encoding networks and a shared classifier into Deep Cross-modal Semantic Encoding (DCSE) module which is trained by supervised learning with labeled data. From DCSE module, we learn a temporary modality-invariant space where the semantic embeddings of samples with different modalities are modality-invariant, and we also learn a classifier which can generate predicted label from the unlabeled data. In unsupervised learning stage, for fully exploiting the correlation from cross-modal data, we design a Constrained Graph Convolutional Network (CGCN) module which utilizes GCN to exploit the correlation and adopts both intra-modal discriminative loss and inter-modal pairwise similar loss to ensure the generated common representation modality-invariant and semantical discriminative. We perform extensive experiments on four conventional datasets and a large scale dataset to demonstrate the effectiveness of proposed approach.

Graph Embedding Learning for Cross-Modal Information Retrieval.

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Learning Visually Aligned Semantic Graph for Cross-Modal Manifold Matching.

Learning Cross-Modal Aligned Representation with Graph Embedding

X-Gacmn: An X-Shaped Generative Adversarial Cross-Modal Network With Hypersphere Embedding

Cross-modal Metric Learning with Graph Embedding.

Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding

Dual graph-structured semantics multi-subspace learning for cross-modal retrieval

Deep Multi-Graph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval

Weighted Graph-structured Semantics Constraint Network for Cross-Modal Retrieval

Bridging Multimedia Heterogeneity Gap Via Graph Representation Learning for Cross-Modal Retrieval.

Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Semantic Modeling of Textual Relationships in Cross-modal Retrieval

Spatial-temporal Graphs for Cross-modal Text2Video Retrieval

Multicenter clinical trial of implanted norethindrone pellets for long-acting contraception in women. Program for Applied Research on Fertility Regulation.

Cluster-aware Multiplex InfoMax for Unsupervised Graph Representation Learning

Semi-supervised constrained graph convolutional network for cross-modal retrieval

Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval

Graph Convolutional Network Hashing for Cross-Modal Retrieval

Adversarial Graph Convolutional Network for Cross-Modal Retrieval