Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling

Jinming Zhao,Shizhe Chen,Jingjun Liang,Qin Jin
DOI: https://doi.org/10.21437/interspeech.2019-2103
2019-01-01
Abstract:In dyadic human-human interactions, a more complex interaction scenario, a person’s emotional state can be influenced by both self emotional evolution and the interlocutor’s behaviors. However, previous speech emotion recognition studies infer the speaker’s emotional state mainly based on the targeted speech segment without considering the above two contextual factors. In this paper, we propose an Attentive Interaction Model (AIM) to capture both selfand interlocutor-context to enhance the speech emotion recognition in the dyadic dialog. The model learns to dynamically focus on long-term relevant contexts of the speaker and the interlocutor via the self-attention mechanism and fuse the adaptive context with the present behavior to predict the current emotional state. We carry out extensive experiments on the IEMOCAP corpus for dimensional emotion recognition in arousal and valence. Our model achieves on par performance with baselines for arousal recognition and significantly outperforms baselines for valence recognition, which demonstrates the effectiveness of the model to select useful contexts for emotion recognition in dyadic interactions.
What problem does this paper attempt to address?