Abstract:In recent years, Intelligent Personal Assistants (IPAs) have emerged as important tools in human–computer interaction, with a wide range of applications such as voice assistant, virtual customer service, and navigation. Capturing and understanding the prominent emotional needs of users is important for improving the quality of service of IPAs. Multimodal emotion recognition in conversation (MMERC) aimed at automatically identifying and tracking the emotional states of speakers during the dialogue process has become a crucial component for building emotional IPAs and attracted increasing attention. Current research in this field is based on graph simulation for cross-modal and single-modal interactions. However, these methods ignore the highly imbalanced class problem inherent in MMERC, leading to a decrease in the generalization ability of the model and an inability to effectively recognize minority emotion classes. Data mining methods use oversampling to solve the imbalanced classification, but they are unsuitable for MMERC as they disrupt the conversational coherence and modality alignment characteristics of multimodal emotion recognition datasets. To overcome these problems, this paper proposes an IMBA-MMERC, which is an effective framework to address the pervasive issue of class imba lance in MMERC . Within this framework, sample generation for multimodal conversation tackles the application challenges that exist in multimodal conversational emotion recognition datasets, and well-classified encouraging loss mitigates the performance degradation of the model on certain majority classes due to decision boundary deviations. On two English benchmark datasets and one Chinese public dataset, we used two performance indicators to demonstrate the effectiveness and superiority of the proposed IMBA-MMERC. Ablation experiment, case study, and histograms visualization further verify the well performance of the proposed framework.

Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

MALN: Multimodal Adversarial Learning Network for Conversational Emotion Recognition

Generating and encouraging: An effective framework for solving class imbalance in multimodal emotion recognition conversation

Fusing pairwise modalities for emotion recognition in conversations

Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition

Bridging the Emotional Semantic Gap via Multimodal Relevance Estimation

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

MultiEMO: an Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations.

CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation

AM^2-EmoJE: Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning

Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework

Self-adaptive Context and Modal-interaction Modeling For Multimodal Emotion Recognition

Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

Multimodal Emotion Recognition Using Multimodal Deep Learning

Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism

Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations

MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations