DECN: Dialogical emotion correction network for conversational emotion recognition

Zheng Lian,Bin Liu,Jianhua Tao
DOI: https://doi.org/10.1016/j.neucom.2021.05.017
IF: 6
2021-09-01
Neurocomputing
Abstract:<p>Emotion recognition in conversation (ERC) is an important research topic in artificial intelligence. Different from the emotion estimation in individual utterances, ERC requires proper handling of human interactions in conversations. Several approaches have been proposed for ERC and achieved promising results. In this paper, we propose a correction model for previous approaches, called "Dialogical Emotion Correction Network (DECN)". This model aims to automatically correct some errors made by emotion recognition strategies and further improve the recognition performance. Specifically, DECN employs a graphical network to model human interactions in conversations. To further utilize the contextual information, DECN also employs the multi-head attention based bi-directional GRU component. Since DECN is a correction model for ERC, it can be easily integrated with any emotion recognition strategy. Experimental results on the IEMOCAP and MELD datasets verify the effectiveness of our proposed method. DECN can improve the performance of emotion recognition strategies with few parameters and low computational complexity.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the error correction problem in Emotion Recognition in Conversations (ERC). Unlike emotion recognition in isolated utterances, ERC needs to handle interpersonal interactions within conversations. Although various methods have achieved good results in ERC, they still suffer from some misjudgments. To this end, the authors propose a new model—Dialogical Emotion Correction Network (DECN), which aims to automatically correct certain errors in emotion recognition strategies and further improve recognition performance. ### Main Contributions 1. **Proposing the DECN Model**: DECN is an emotion correction model for ERC that automatically corrects errors in the emotion recognition engine by leveraging interpersonal interactions and contextual information in conversations. 2. **Systematic Study of the Importance of DECN Components**: This includes interaction modeling based on Graph Neural Networks (GGNN) and context-sensitive modeling based on Multi-Head Attention Bidirectional GRU (MHA-GRU). 3. **Flexibility and Fewer Parameters**: The DECN model has fewer parameters and low computational complexity, making it easy to integrate into any emotion recognition system. 4. **Experimental Validation**: Experimental results on two benchmark datasets (IEMOCAP and MELD) show that DECN can significantly improve the performance of various emotion recognition engines. ### Method Overview 1. **Preliminary Emotion Recognition**: First, the emotion recognition engine performs preliminary emotion classification for each utterance, generating initial emotion probabilities. 2. **Interaction Modeling**: Graph Neural Networks (GGNN) are used to model interpersonal interactions in conversations, including self-influence and mutual influence. Specifically, edges are established between each current target speaker's utterance node and its previous target speaker's utterance node, as well as the previous opposing speaker's utterance node, using different edge types to distinguish these two influences. 3. **Context-Sensitive Modeling**: The Multi-Head Attention Bidirectional GRU (MHA-GRU) module is used to extract contextual information, further enhancing the model's context awareness. 4. **Model Training**: The final emotion category probabilities are calculated through a fully connected layer and a softmax layer, and the model is trained using the cross-entropy loss function. ### Experimental Setup - **Datasets**: IEMOCAP and MELD, which contain lexical, visual, and acoustic information in conversations. This paper mainly focuses on acoustic and lexical modalities. - **Hyperparameter Settings**: GGNN includes 5 propagation steps, and the feature dimension of node representations is set to the number of emotion categories. MHA-GRU includes one bidirectional GRU layer (8 units per GRU component) and one multi-head self-attention layer (16-dimensional states and 2 attention heads). Through these methods, DECN can significantly improve the accuracy of emotion recognition while maintaining low computational complexity.