Abstract:With the development of computer network, multimedia and digital transmission technology in recent years, the traditional form of information dissemination which mainly depends on text has changed to the multimedia form including texts, images, videos, audios and so on. Under this situation, to meet the growing demand of users for access to multimedia information, cross-media retrieval has become a key problem of research and application. Given queries of any media type, cross-media retrieval can return all relevant media types as results with similar semantics. For measuring the similarity between different media types, it is important to learn better shared representation for multimedia data. Existing methods mainly extract single modal representation for each media type and then learn the cross-media correlations with pairwise similar constraint, which cannot make full use of the rich information within each media type and ignore the dissimilar constraints between different media types. For addressing the above problems, this paper proposes a deep multimodal learning method (DML) for cross-media shared representation learning. First, we adopt two different deep networks for each media type with multimodal learning, which can obtain the high-level semantic representation of single media. Then, a two-pathway network is constructed by jointly modeling the pairwise similar and dissimilar constraints with a contrastive loss to get the shared representation. The experiments are conducted on two widely-used cross-media datasets, which shows the effectiveness of our proposed method. abstract environment.

Learning Joint Multimodal Representation Based On Multi-Fusion Deep Neural Networks

Dense Multimodal Fusion for Hierarchically Joint Representation

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

A Survey on Deep Learning for Multimodal Data Fusion

Deep Multimodal Data Fusion

Multimodal Medical Image Fusion: The Perspective of Deep Learning

Dual Low-Rank Multimodal Fusion

Multimodal Deep Representation Learning for Video Classification

On Uni-modal Feature Learning in Multi-modal Learning

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Deep Fusion Of Heterogeneous Sensor Data

Deep Equilibrium Multimodal Fusion

Learn to Combine Modalities in Multimodal Deep Learning

What Makes Multi-modal Learning Better than Single (Provably)

Cross-Media Retrieval by Multimodal Representation Fusion with Deep Networks.

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

On the Benefits of Early Fusion in Multimodal Representation Learning

An Effective Multimodal Representation and Fusion Method for Multimodal Intent Recognition

MIMF: Mutual Information-Driven Multimodal Fusion

Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond

Weakly paired multimodal fusion using multilayer extreme learning machine