Common-Specific Multimodal Learning for Deep Belief Network

Changsheng Xiang,Xiaoming Jin
DOI: https://doi.org/10.1145/3132847.3133092
2017-01-01
Abstract:Multimodal Deep Belief Network has been widely used to extract representations for multimodal data by fusing the high-level features of each data modality into common representations. Such straightforward fusion strategy can benefit the classification and information retrieval tasks. However, it may introduce noise in case the high-level features are not naturally common hence non-fusable for different modalities. Intuitively, each modality may have its own specific features and corresponding representation capabilities thus should not be simply fused. Therefore, it is more reasonable to fuse only the common features and represent the multimodal data by both the fused features and the modality-specific features. To distinguish common features from modal-specific features is a challenging task for traditional DBN models where all features are crudely mixed. This paper proposes the CommonSpecific Multimodal Deep Belief Network (CSDBN) to solve the problem. CS-DBN automatically separates common features from modalspecific features and fuses only the common ones for data representation. Experimental results demonstrate the superiority of CS-DBN for classification tasks compared with the baseline approaches.
What problem does this paper attempt to address?