Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

Shanglin Lei,Xiaoping Wang,Guanting Dong,Jiang Li,Yingjian Liu
2023-09-19
Abstract:Emotion Recognition in Conversation (ERC) has attracted widespread attention in the natural language processing field due to its enormous potential for practical applications. Existing ERC methods face challenges in achieving generalization to diverse scenarios due to insufficient modeling of context, ambiguous capture of dialogue relationships and overfitting in speaker modeling. In this work, we present a Hybrid Continuous Attributive Network (HCAN) to address these issues in the perspective of emotional continuation and emotional attribution. Specifically, HCAN adopts a hybrid recurrent and attention-based module to model global emotion continuity. Then a novel Emotional Attribution Encoding (EAE) is proposed to model intra- and inter-emotional attribution for each utterance. Moreover, aiming to enhance the robustness of the model in speaker modeling and improve its performance in different scenarios, A comprehensive loss function emotional cognitive loss $\mathcal{L}_{\rm EC}$ is proposed to alleviate emotional drift and overcome the overfitting of the model to speaker modeling. Our model achieves state-of-the-art performance on three datasets, demonstrating the superiority of our work. Another extensive comparative experiments and ablation studies on three benchmarks are conducted to provided evidence to support the efficacy of each module. Further exploration of generalization ability experiments shows the plug-and-play nature of the EAE module in our method.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address several key challenges in Emotion Recognition in Conversation (ERC): 1. **Insufficient Context Modeling**: - Existing ERC methods struggle to capture global emotional continuity in long conversations. While methods based on recurrent neural networks can establish natural temporal correlations, they may not effectively capture emotional continuity in long conversations. On the other hand, methods based on attention mechanisms can aggregate emotional features at multiple levels but may not be as effective as temporal models in capturing emotional continuity over long time spans. 2. **Ambiguous Dialogue Relationships**: - Current methods lack detailed modeling of emotional influences in dialogue relationships. In real conversations, direct dialogue relationships often lead to more direct emotional transmission, but the ERC field still lacks methods that model emotional influences from the perspective of dialogue relationships in detail. 3. **Overfitting in Speaker Modeling**: - In ERC tasks, different speakers exhibit significant differences in emotional expression. Although some studies utilize complex network designs to leverage fine-grained information, these methods are prone to overfitting in different dialogue scenarios, thus limiting their effectiveness. To address these issues, the authors propose a Hybrid Continuous Attributive Network (HCAN), which improves existing methods through the following approaches: - **Emotional Continuation Encoding (ECE)**: Combines recurrent units and attention modules to extract more robust features in different dialogue scenarios, particularly excelling in long conversation samples. - **Emotional Attribution Encoding (EAE)**: Based on the IA-attention mechanism, it models internal and external emotional attributions of each sentence from an attribution perspective, providing more direct and accurate human emotional understanding. - **Emotional Cognitive Loss (LEC)**: Enhances the model's robustness and generalization ability by combining cross-entropy, KL divergence, and adversarial emotional decoupling loss, mitigating emotional drift and overfitting in speaker modeling. Through these innovations, HCAN achieves state-of-the-art performance on three benchmark datasets, demonstrating its superiority and generalization ability in different application scenarios.