Abstract:Exploiting relationship among samples in cross-modal data plays a key role in the task of cross-modal retrieval, but most of existing methods only extract the correlation from pairwise samples and ignore the relations of unpaired samples. Some graph regularization methods proposed a reasonable paradigm to exploit the correlation from multiple samples. However, limited by the traditional framework, the performance has much room to improve. Moreover, although some existing DNN-based methods achieve excellent performance, the requirement of massive labeled data is also a shortcoming. In this paper, we propose a novel semi-supervised method, named Semi-supervised Constrained Graph Convolutional Network (SCGCN), which adopts graph convolutional network to exploit correlation from batch samples of data with different modalities. For reducing the requirement of labeled data, we design a two stage training procedure: deep supervised learning stage and unsupervised learning stage. In deep supervised learning stage, we integrate two DNN-based semantic encoding networks and a shared classifier into Deep Cross-modal Semantic Encoding (DCSE) module which is trained by supervised learning with labeled data. From DCSE module, we learn a temporary modality-invariant space where the semantic embeddings of samples with different modalities are modality-invariant, and we also learn a classifier which can generate predicted label from the unlabeled data. In unsupervised learning stage, for fully exploiting the correlation from cross-modal data, we design a Constrained Graph Convolutional Network (CGCN) module which utilizes GCN to exploit the correlation and adopts both intra-modal discriminative loss and inter-modal pairwise similar loss to ensure the generated common representation modality-invariant and semantical discriminative. We perform extensive experiments on four conventional datasets and a large scale dataset to demonstrate the effectiveness of proposed approach.

Cross-modal retrieval based on fusion lightweight network.

Iterative graph attention memory network for cross-modal retrieval

Weighted Graph-structured Semantics Constraint Network for Cross-Modal Retrieval

Feature Fusion Based on Transformer for Cross-modal Retrieval

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Cross-modal retrieval based on multi-dimensional feature fusion hashing

Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation

Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities

Cross-Modal Hash Method Based on Multi-Scale Fusion and Projection Matching Constraint

Cross‐modal retrieval with dual multi‐angle self‐attention

CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network.

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

Federated learning for supervised cross-modal retrieval

Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network

Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval

Semi-supervised constrained graph convolutional network for cross-modal retrieval

Semantic enhancement and multi-level alignment network for cross-modal retrieval

Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval

Deep Multi-Graph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval

Complementarity-aware cross-modal feature fusion network for RGB-T semantic segmentation

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval