Discriminative Multi-Modal Deep Generative Models

Fang Du,Jiangshe Zhang,Junying Hu,Rongrong Fei
DOI: https://doi.org/10.1016/j.knosys.2019.02.023
IF: 8.139
2019-01-01
Knowledge-Based Systems
Abstract:Multi-modal learning is of practical importance for real world datasets with heterogeneous features. Most of existing multi-modal algorithms aim to learn shared representations that can maximally extract the correlations among multi-modal data in an unsupervised manner. Deep variational canonical correlation analysis-private (DVCCA-private) is an efficient multi-modal generative learning model in which each modality of data is generated by “common” variables underlying in both modalities and “private” variables within specific modality. After training, the inferred common representations are used to train classifier and competitive performance can be achieved. For many multi-modal datasets, every modality may contain specific discriminative information compared with other modalities and the common information among modalities may be irrelevant to discriminate. In this paper, we propose a discriminative multi-modal deep generative model (DMDGM), in which each modality of data is generated by “label” variables underlying in both modalities and “private” variables. The proposed model can separate discriminative information from discrimination-insensitive information. We derive variational lower bounds of the data likelihood and train the model by maximizing this lower bound. The proposed model combines representation learning and classifier training in a unified framework and the inferred “label” representations are directly used for label prediction without additional classifier. The empirical results on Noisy-MNIST, XRMB and NUS datasets show that the proposed DMDGM can significantly improve the prediction performance compared with previous multi-modal deep generative models.
What problem does this paper attempt to address?