DER-GCN: Dialog and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialog Emotion Recognition

Wei Ai,Yuntao Shou,Tao Meng,Keqin Li

DOI: https://doi.org/10.1109/tnnls.2024.3367940

IF: 14.255

2024-01-01

IEEE Transactions on Neural Networks and Learning Systems

Abstract:With the continuous development of deep learning (DL), the task of multimodal dialog emotion recognition (MDER) has recently received extensive research attention, which is also an essential branch of DL. The MDER aims to identify the emotional information contained in different modalities, e.g., text, video, and audio, and in different dialog scenes. However, the existing research has focused on modeling contextual semantic information and dialog relations between speakers while ignoring the impact of event relations on emotion. To tackle the above issues, we propose a novel dialog and event relation-aware graph convolutional neural network (DER-GCN) for multimodal emotion recognition method. It models dialog relations between speakers and captures latent event relations information. Specifically, we construct a weighted multirelationship graph to simultaneously capture the dependencies between speakers and event relations in a dialog. Moreover, we also introduce a self-supervised masked graph autoencoder (SMGAE) to improve the fusion representation ability of features and structures. Next, we design a new multiple information Transformer (MIT) to capture the correlation between different relations, which can provide a better fuse of the multivariate information between relations. Finally, we propose a loss optimization strategy based on contrastive learning to enhance the representation learning ability of minority class features. We conduct extensive experiments on the benchmark datasets, Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Multimodal EmotionLines Dataset (MELD), which verify the effectiveness of the DER-GCN model. The results demonstrate that our model significantly improves both the average accuracy and the F1 value of emotion recognition. Our code is publicly available at https://github.com/yuntaoshou/DER-GCN.

computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture

What problem does this paper attempt to address?

The paper attempts to address the issue in the task of Multimodal Dialogue Emotion Recognition (MDER), where existing methods mainly focus on modeling contextual semantic information and dialogue relationships, while neglecting the impact of event relationships on emotion recognition. Specifically, the paper points out: 1. **Internal and External Factors in Emotion Recognition**: During the dialogue process, the speaker's emotions are influenced not only by internal factors (such as contextual information) but also by external factors (such as dialogue relationships and event relationships). However, existing research often overlooks the role of external factors. 2. **Data Imbalance Problem**: Due to the high cost of annotation, MDER datasets usually exhibit a long-tail distribution, resulting in poor performance of models in recognizing minority class emotions. To address these issues, the paper proposes a new Dialogue and Event Relationship-aware Graph Convolutional Network (DER-GCN) architecture, aiming to improve the performance of multimodal emotion recognition by modeling dialogue relationships and event relationships. The specific objectives include: - **Cross-modal Feature Fusion**: Achieving more accurate emotion recognition by combining information from text, video, and audio modalities. - **Addressing Data Imbalance Problem**: Balancing the proportions of different emotion categories by optimizing the loss function through contrastive learning. - **Learning More Distinctive Class Boundaries**: Enhancing the distinguishability of emotion categories by modeling event relationships. In summary, the goal of the paper is to improve the accuracy and robustness of multimodal dialogue emotion recognition by introducing event relationships and optimizing the model structure.

DER-GCN: Dialog and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialog Emotion Recognition

DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition

MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition

Dynamic Graph Neural Ordinary Differential Equation Network for Multi-modal Emotion Recognition in Conversation

Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis

MMDAG: Multimodal Directed Acyclic Graph Network for Emotion Recognition in Conversation

Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional Network

Efficient Long-distance Latent Relation-aware Graph Neural Network for Multi-modal Emotion Recognition in Conversations

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation

Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation

MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations

GraphMFT: A Graph Network based Multimodal Fusion Technique for Emotion Recognition in Conversation

Affect-GCN: a multimodal graph convolutional network for multi-emotion with intensity recognition and sentiment analysis in dialogues

Multiple Knowledge-Enhanced Interactive Graph Network for Multimodal Conversational Emotion Recognition

Adaptive Graph Learning for Multimodal Conversational Emotion Detection

A Novel and Powerful Dual-Stream Multi-Level Graph Convolution Network for Emotion Recognition

LR-GCN: Latent Relation-Aware Graph Convolutional Network for Conversational Emotion Recognition

DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation

MLGAT: multi-layer graph attention networks for multimodal emotion recognition in conversations

Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition