Abstract:<p>Emotion recognition in conversation (ERC) is an important research topic in artificial intelligence. Different from the emotion estimation in individual utterances, ERC requires proper handling of human interactions in conversations. Several approaches have been proposed for ERC and achieved promising results. In this paper, we propose a correction model for previous approaches, called "Dialogical Emotion Correction Network (DECN)". This model aims to automatically correct some errors made by emotion recognition strategies and further improve the recognition performance. Specifically, DECN employs a graphical network to model human interactions in conversations. To further utilize the contextual information, DECN also employs the multi-head attention based bi-directional GRU component. Since DECN is a correction model for ERC, it can be easily integrated with any emotion recognition strategy. Experimental results on the IEMOCAP and MELD datasets verify the effectiveness of our proposed method. DECN can improve the performance of emotion recognition strategies with few parameters and low computational complexity.</p>

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the error correction problem in Emotion Recognition in Conversations (ERC). Unlike emotion recognition in isolated utterances, ERC needs to handle interpersonal interactions within conversations. Although various methods have achieved good results in ERC, they still suffer from some misjudgments. To this end, the authors propose a new model—Dialogical Emotion Correction Network (DECN), which aims to automatically correct certain errors in emotion recognition strategies and further improve recognition performance. ### Main Contributions 1. **Proposing the DECN Model**: DECN is an emotion correction model for ERC that automatically corrects errors in the emotion recognition engine by leveraging interpersonal interactions and contextual information in conversations. 2. **Systematic Study of the Importance of DECN Components**: This includes interaction modeling based on Graph Neural Networks (GGNN) and context-sensitive modeling based on Multi-Head Attention Bidirectional GRU (MHA-GRU). 3. **Flexibility and Fewer Parameters**: The DECN model has fewer parameters and low computational complexity, making it easy to integrate into any emotion recognition system. 4. **Experimental Validation**: Experimental results on two benchmark datasets (IEMOCAP and MELD) show that DECN can significantly improve the performance of various emotion recognition engines. ### Method Overview 1. **Preliminary Emotion Recognition**: First, the emotion recognition engine performs preliminary emotion classification for each utterance, generating initial emotion probabilities. 2. **Interaction Modeling**: Graph Neural Networks (GGNN) are used to model interpersonal interactions in conversations, including self-influence and mutual influence. Specifically, edges are established between each current target speaker's utterance node and its previous target speaker's utterance node, as well as the previous opposing speaker's utterance node, using different edge types to distinguish these two influences. 3. **Context-Sensitive Modeling**: The Multi-Head Attention Bidirectional GRU (MHA-GRU) module is used to extract contextual information, further enhancing the model's context awareness. 4. **Model Training**: The final emotion category probabilities are calculated through a fully connected layer and a softmax layer, and the model is trained using the cross-entropy loss function. ### Experimental Setup - **Datasets**: IEMOCAP and MELD, which contain lexical, visual, and acoustic information in conversations. This paper mainly focuses on acoustic and lexical modalities. - **Hyperparameter Settings**: GGNN includes 5 propagation steps, and the feature dimension of node representations is set to the number of emotion categories. MHA-GRU includes one bidirectional GRU layer (8 units per GRU component) and one multi-head self-attention layer (16-dimensional states and 2 attention heads). Through these methods, DECN can significantly improve the accuracy of emotion recognition while maintaining low computational complexity.

DECN: Dialogical emotion correction network for conversational emotion recognition

Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional Network

DialoguePCN: Perception and Cognition Network for Emotion Recognition in Conversations

Dialogue emotion model based on local–global context encoder and commonsense knowledge fusion attention

Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis

Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation

Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation

ERNetCL: A novel emotion recognition network in textual conversation based on curriculum learning strategy

A Contextual Attention Network for Multimodal Emotion Recognition in Conversation

Context- and Sentiment-Aware Networks for Emotion Recognition in Conversation

A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition

DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition

LR-GCN: Latent Relation-Aware Graph Convolutional Network for Conversational Emotion Recognition

EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation

RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition

DER-GCN: Dialog and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialog Emotion Recognition

MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations

Emotion recognition in conversations with emotion shift detection based on multi-task learning

ITEACH-Net: Inverted Teacher-studEnt seArCH Network for Emotion Recognition in Conversation

Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning