Abstract:Multi-modal learning is of practical importance for real world datasets with heterogeneous features. Most of existing multi-modal algorithms aim to learn shared representations that can maximally extract the correlations among multi-modal data in an unsupervised manner. Deep variational canonical correlation analysis-private (DVCCA-private) is an efficient multi-modal generative learning model in which each modality of data is generated by “common” variables underlying in both modalities and “private” variables within specific modality. After training, the inferred common representations are used to train classifier and competitive performance can be achieved. For many multi-modal datasets, every modality may contain specific discriminative information compared with other modalities and the common information among modalities may be irrelevant to discriminate. In this paper, we propose a discriminative multi-modal deep generative model (DMDGM), in which each modality of data is generated by “label” variables underlying in both modalities and “private” variables. The proposed model can separate discriminative information from discrimination-insensitive information. We derive variational lower bounds of the data likelihood and train the model by maximizing this lower bound. The proposed model combines representation learning and classifier training in a unified framework and the inferred “label” representations are directly used for label prediction without additional classifier. The empirical results on Noisy-MNIST, XRMB and NUS datasets show that the proposed DMDGM can significantly improve the prediction performance compared with previous multi-modal deep generative models.

Discriminative Multi-Modal Deep Generative Models

Multimodal Adversarially Learned Inference with Factorized Discriminators

Discriminative multimodal learning via conditional priors in generative models

Max-Margin Deep Generative Models.

Multi-Modal Latent Diffusion

Diffusion Models For Multi-Modal Generative Modeling

Multimodal Generative Models for Scalable Weakly-Supervised Learning

Multimodal deep generative adversarial models for scalable doubly semi-supervised learning

Generative-Discriminative Complementary Learning

A Multi-Player Minimax Game for Generative Adversarial Networks.

Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond

DMGAN: Discriminative Metric-based Generative Adversarial Networks.

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Partial Modal Conditioned GANs for Multi-modal Multi-label Learning with Arbitrary Modal-Missing

Variational methods for Conditional Multimodal Deep Learning

Joint Multimodal Learning with Deep Generative Models

Common and Discriminative Semantic Pursuit for Multi-Modal Multi-Label Learning

Multi-Modal Generative Embedding Model

Neural generative model for clustering by separating particularity and commonality

Unified Generative and Discriminative Training for Multi-modal Large Language Models