Multimodal Emotion Recognition by Extracting Common and Modality-Specific Information.

Wei Zhang,Weixi Gu,Fei Ma,Shiguang Ni,Lin Zhang,Shao-Lun Huang
DOI: https://doi.org/10.1145/3274783.3275200
2018-01-01
Abstract:Emotion recognition technologies have been widely used in numerous areas including advertising, healthcare and online education. Previous works usually recognize the emotion from either the acoustic or the visual signal, yielding unsatisfied performances and limited applications. To improve the inference capability, we present a multimodal emotion recognition model, EMOdal. Apart from learning the audio and visual data respectively, EMOdal efficiently learns the common and modality-specific information underlying the two kinds of signals, and therefore improves the inference ability. The model has been evaluated on our large-scale emotional data set. The comprehensive evaluations demonstrate that our model outperforms traditional approaches.
What problem does this paper attempt to address?