Multimodal Dimensional and Continuous Emotion Recognition in Dyadic Video Interactions.

Jinming Zhao,Shizhe Chen,Qin Jin
DOI: https://doi.org/10.1007/978-3-030-00776-8_28
2018-01-01
Abstract:Automatic emotion recognition is a challenging task which can make great impact on improving natural human computer interactions. In dyadic human-human interactions, a more complex interaction scenario, a person's emotion state will be influenced by the interlocutor's behaviors, such as talking style/prosody, speech content, facial expression and body language. Mutual influence, a person's influence on the interacting partner's behaviors in a dialog, is shown to be important for predicting the person's emotion state in previous works. In this paper, we proposed several multimodal interaction strategies to imitate the interactive patterns in the real scenarios for exploring the effect of mutual influence in continuous emotion prediction tasks. Our experiments based on the Audio/Visual Emotion Challenge (AVEC) 2017 dataset used in continuous emotion prediction tasks, and the results show that our proposed multimodal interaction strategy gains 3.82% and 3.26% absolute improvement on arousal and valence respectively. Additionally, we analyse the influence of the correlation between the interactive pairs on both arousal and valence. Our experimental results show that the interactive pairs with strong correlation significantly outperform the pairs with weak correlation on both arousal and valence.
What problem does this paper attempt to address?