Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video

Zhongmin Wang,Xiaoxiao Zhou,Wenlang Wang,Chen Liang
DOI: https://doi.org/10.1007/s13042-019-01056-8
2020-01-20
International Journal of Machine Learning and Cybernetics
Abstract:Emotion recognition has attracted great interest. Numerous emotion recognition approaches have been proposed, most of which focus on visual, acoustic or psychophysiological information individually. Although more recent research has considered multimodal approaches, individual modalities are often combined only by simple fusion or are directly fused with deep learning networks at the feature level. In this paper, we propose an approach to training several specialist networks that employs deep learning techniques to fuse the features of individual modalities. This approach includes a multimodal deep belief network (MDBN), which optimizes and fuses unified psychophysiological features derived from the features of multiple psychophysiological signals, a bimodal deep belief network (BDBN) that focuses on representative visual features among the features of a video stream, and another BDBN that focuses on the high multimodal features in the unified features obtained from two modalities. Experiments are conducted on the BioVid Emo DB database and 80.89% accuracy is achieved, which outperforms the state-of-the-art approaches. The results demonstrate that the proposed approach can solve the problems of feature redundancy and lack of key features caused by multimodal fusion.
What problem does this paper attempt to address?