Abstract:Cross-media retrieval has become a key problem in both research and application, in which users can search results across all of the media types (text, image, audio, video, and 3-D) by submitting a query of any media type. How to measure the content similarity among different media is the key challenge. Existing cross-media retrieval methods usually focus on modeling the pairwise correlation or semantic information separately. In fact, these two kinds of information are complementary to each other and optimizing them simultaneously can further improve the accuracy. In this paper, we propose a novel feature learning algorithm for cross-media data, called joint representation learning (JRL), which is able to explore jointly the correlation and semantic information in a unified optimization framework. JRL integrates the sparse and semisupervised regularization for different media types into one unified optimization problem, while existing feature learning methods generally focus on a single media type. On one hand, JRL learns sparse projection matrix for different media simultaneously, so different media can align with each other, which is robust to the noise. On the other hand, both the labeled data and unlabeled data of different media types are explored. Unlabeled examples of different media types increase the diversity of training data and boost the performance of joint representation learning. Furthermore, JRL can not only reduce the dimension of the original features, but also incorporate the cross-media correlation into the final representation, which further improves the performance of both cross-media retrieval and single-media retrieval. Experiments on two datasets with up to five media types show the effectiveness of our proposed approach, as compared with the state-of-the-art methods.

Enhanced Isomorphic Semantic Representation For Cross-Media Retrieval

Online latent semantic hashing for cross-media retrieval.

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Crossmedia retrieval by learning rich semantic embeddings of multimedia

Semantic Consistency Hashing for Cross-Modal Retrieval

Learning a Semantic Space by Deep Network for Cross-media Retrieval.

Learning Discriminative Representations for Semantic Cross Media Retrieval

Cross-Media Retrieval by Multimodal Representation Fusion with Deep Networks.

Cross-media semantic representation via bi-directional learning to rank.

Latent Semantic Factorization for Multimedia Representation Learning

Cross-Media Retrieval via Semantic Entity Projection.

Manifold Learning Based Cross-media Retrieval: A Solution to Media Object Complementary Nature

Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval

A Novel Approach Towards Large Scale Cross-Media Retrieval.

Learning Cross-Media Joint Representation with Sparse and Semisupervised Regularization

A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval.

Structures Aware Fine-grained Contrastive Adversarial Hashing for Cross-media Retrieval

Modality-dependent Cross-media Retrieval

Semantic Boosting Cross-Modal Hashing for Efficient Multimedia Retrieval.

Learning Semantic Correlations for Cross-Media Retrieval.

Cross-Media Retrieval: Concepts, Advances And Challenges