Abstract:In this paper, we address the task of utterance level emotion recognition in conversations using commonsense knowledge. We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations, and build upon them to learn interactions between interlocutors participating in a conversation. Current state-of-the-art methods often encounter difficulties in context propagation, emotion shift detection, and differentiating between related emotion classes. By learning distinct commonsense representations, COSMIC addresses these challenges and achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets. Our code is available at <a class="link-external link-https" href="https://github.com/declare-lab/conv-emotion" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper aims to address the problem of Emotion Recognition in Conversations (ERC). Specifically, the authors propose a new framework called COSMIC (COmmonSense knowledge for eMotion Identification in Conversations), which improves the task of emotion recognition in conversations by incorporating commonsense knowledge. ### Background and Challenges 1. **Complexity**: Natural conversations are influenced by multiple variables, including topics, viewpoints, speaker personalities, argument logic, intentions, etc. These variables make the emotional dynamics of conversations very complex. 2. **Context Propagation**: Existing state-of-the-art methods face difficulties in context propagation, making it hard to capture emotional changes in conversations. 3. **Emotion Category Distinction**: Distinguishing between related emotion categories is also a challenge, such as the difference between anger and irritation. ### Solution 1. **Commonsense Knowledge**: The COSMIC framework leverages different elements of commonsense knowledge, such as mental states, events, and causal relationships, to learn the interactions between participants in a conversation. 2. **Multi-stage Modeling**: - **Independent Context Feature Extraction**: Uses pre-trained Transformer language models (e.g., RoBERTa) to extract features independent of the context. - **Commonsense Feature Extraction**: Extracts commonsense features from commonsense knowledge graphs. - **Incorporating Commonsense Knowledge**: Designs better context representations and uses them for the final emotion classification. ### Experimental Results 1. **Datasets**: The paper conducts experiments on four different benchmark conversation datasets, including IEMOCAP, DailyDialog, MELD, and EmoryNLP. 2. **Performance Improvement**: COSMIC achieves new state-of-the-art results on all four datasets, particularly excelling in emotion recognition accuracy and context understanding. ### Conclusion By incorporating commonsense knowledge, the COSMIC framework can better understand and predict the emotional dynamics in conversations, addressing the shortcomings of existing methods in context propagation and emotion category distinction.

COSMIC: COmmonSense knowledge for eMotion Identification in Conversations

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

From Multilingual Complexity to Emotional Clarity: Leveraging Commonsense to Unveil Emotions in Code-Mixed Dialogues

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation

Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional Network

Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational Models

COGMEN: COntextualized GNN based Multimodal Emotion recognitioN

Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation

CKERC : Joint Large Language Models with Commonsense Knowledge for Emotion Recognition in Conversation

Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance

A Contextual Attention Network for Multimodal Emotion Recognition in Conversation

COSMIC: A Conversational Interface for Human-AI Music Co-Creation

Fuzzy commonsense reasoning for multimodal sentiment analysis

Emotion Recognition for Multiple Context Awareness.

Context- and Sentiment-Aware Networks for Emotion Recognition in Conversation

Cluster-Level Contrastive Learning for Emotion Recognition in Conversations

SMIN: Semi-supervised Multi-modal Interaction Network for Conversational Emotion Recognition

Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition

Emotion Recognition in Conversation using Probabilistic Soft Logic

Dialogue emotion model based on local–global context encoder and commonsense knowledge fusion attention

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling