Abstract:Exploiting relationship among samples in cross-modal data plays a key role in the task of cross-modal retrieval, but most of existing methods only extract the correlation from pairwise samples and ignore the relations of unpaired samples. Some graph regularization methods proposed a reasonable paradigm to exploit the correlation from multiple samples. However, limited by the traditional framework, the performance has much room to improve. Moreover, although some existing DNN-based methods achieve excellent performance, the requirement of massive labeled data is also a shortcoming. In this paper, we propose a novel semi-supervised method, named Semi-supervised Constrained Graph Convolutional Network (SCGCN), which adopts graph convolutional network to exploit correlation from batch samples of data with different modalities. For reducing the requirement of labeled data, we design a two stage training procedure: deep supervised learning stage and unsupervised learning stage. In deep supervised learning stage, we integrate two DNN-based semantic encoding networks and a shared classifier into Deep Cross-modal Semantic Encoding (DCSE) module which is trained by supervised learning with labeled data. From DCSE module, we learn a temporary modality-invariant space where the semantic embeddings of samples with different modalities are modality-invariant, and we also learn a classifier which can generate predicted label from the unlabeled data. In unsupervised learning stage, for fully exploiting the correlation from cross-modal data, we design a Constrained Graph Convolutional Network (CGCN) module which utilizes GCN to exploit the correlation and adopts both intra-modal discriminative loss and inter-modal pairwise similar loss to ensure the generated common representation modality-invariant and semantical discriminative. We perform extensive experiments on four conventional datasets and a large scale dataset to demonstrate the effectiveness of proposed approach.

Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Graph Embedding Learning for Cross-Modal Information Retrieval.

Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval

Deep Multi-Graph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval

Semi-supervised constrained graph convolutional network for cross-modal retrieval

Learning Cross-Modal Aligned Representation with Graph Embedding

Semantic Modeling of Textual Relationships in Cross-modal Retrieval

Weighted Graph-structured Semantics Constraint Network for Cross-Modal Retrieval

Iterative graph attention memory network for cross-modal retrieval

Semantic-enhanced discriminative embedding learning for cross-modal retrieval

Dual graph-structured semantics multi-subspace learning for cross-modal retrieval

SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention

Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective

Cross-modal Metric Learning with Graph Embedding.

Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image–Text Retrieval

Semantic enhancement and multi-level alignment network for cross-modal retrieval

Local Semantic Correlation Modeling Over Graph Neural Networks for Deep Feature Embedding and Image Retrieval

Adversarial Graph Convolutional Network for Cross-Modal Retrieval

Semantically Supervised Maximal Correlation for Cross-Modal Retrieval

Federated learning for supervised cross-modal retrieval