Abstract:Multi-modal learning is of practical importance for real world datasets with heterogeneous features. Most of existing multi-modal algorithms aim to learn shared representations that can maximally extract the correlations among multi-modal data in an unsupervised manner. Deep variational canonical correlation analysis-private (DVCCA-private) is an efficient multi-modal generative learning model in which each modality of data is generated by “common” variables underlying in both modalities and “private” variables within specific modality. After training, the inferred common representations are used to train classifier and competitive performance can be achieved. For many multi-modal datasets, every modality may contain specific discriminative information compared with other modalities and the common information among modalities may be irrelevant to discriminate. In this paper, we propose a discriminative multi-modal deep generative model (DMDGM), in which each modality of data is generated by “label” variables underlying in both modalities and “private” variables. The proposed model can separate discriminative information from discrimination-insensitive information. We derive variational lower bounds of the data likelihood and train the model by maximizing this lower bound. The proposed model combines representation learning and classifier training in a unified framework and the inferred “label” representations are directly used for label prediction without additional classifier. The empirical results on Noisy-MNIST, XRMB and NUS datasets show that the proposed DMDGM can significantly improve the prediction performance compared with previous multi-modal deep generative models.

Latent Gaussian-Multinomial Generative Model for Annotated Data.

Multimodal Latent Language Modeling with Next-Token Diffusion

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

Generative Label Enhancement with Gaussian Mixture and Partial Ranking.

Latent Dirichlet Allocation Based Generative Adversarial Networks.

Semi-supervised topic modeling for image annotation.

Using Local Discriminant Topic To Improve Generative Model Based Image Annotation

LLMGA: Multimodal Large Language Model based Generation Assistant

GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models

Generative Classification Model for Categorical Data Based on Latent Gaussian Process.

Image Annotation by Latent Community Detection and Multikernel Learning.

Effective Image Auto-Annotation Via Discriminative Hyperplane Tree Based Generative Model

Moderating the Generalization of Score-based Generative Model

LCBM: A Multi-View Probabilistic Model for Multi-Label Classification

Discriminative Multi-Modal Deep Generative Models

Approximated Anomalous Diffusion: Gaussian Mixture Score-based Generative Models

Hybrid Generative/Discriminative Learning for Automatic Image Annotation

Multinomial Latent Logistic Regression For Image Understanding

Effective Automatic Image Annotation Via Integrated Discriminative and Generative Models

GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data

A Revisit of Generative Model for Automatic Image Annotation Using Markov Random Fields