DER-GCN: Dialog and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialog Emotion Recognition

Wei Ai,Yuntao Shou,Tao Meng,Keqin Li
DOI: https://doi.org/10.1109/tnnls.2024.3367940
IF: 14.255
2024-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:With the continuous development of deep learning (DL), the task of multimodal dialog emotion recognition (MDER) has recently received extensive research attention, which is also an essential branch of DL. The MDER aims to identify the emotional information contained in different modalities, e.g., text, video, and audio, and in different dialog scenes. However, the existing research has focused on modeling contextual semantic information and dialog relations between speakers while ignoring the impact of event relations on emotion. To tackle the above issues, we propose a novel dialog and event relation-aware graph convolutional neural network (DER-GCN) for multimodal emotion recognition method. It models dialog relations between speakers and captures latent event relations information. Specifically, we construct a weighted multirelationship graph to simultaneously capture the dependencies between speakers and event relations in a dialog. Moreover, we also introduce a self-supervised masked graph autoencoder (SMGAE) to improve the fusion representation ability of features and structures. Next, we design a new multiple information Transformer (MIT) to capture the correlation between different relations, which can provide a better fuse of the multivariate information between relations. Finally, we propose a loss optimization strategy based on contrastive learning to enhance the representation learning ability of minority class features. We conduct extensive experiments on the benchmark datasets, Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Multimodal EmotionLines Dataset (MELD), which verify the effectiveness of the DER-GCN model. The results demonstrate that our model significantly improves both the average accuracy and the F1 value of emotion recognition. Our code is publicly available at https://github.com/yuntaoshou/DER-GCN.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?
The paper attempts to address the issue in the task of Multimodal Dialogue Emotion Recognition (MDER), where existing methods mainly focus on modeling contextual semantic information and dialogue relationships, while neglecting the impact of event relationships on emotion recognition. Specifically, the paper points out: 1. **Internal and External Factors in Emotion Recognition**: During the dialogue process, the speaker's emotions are influenced not only by internal factors (such as contextual information) but also by external factors (such as dialogue relationships and event relationships). However, existing research often overlooks the role of external factors. 2. **Data Imbalance Problem**: Due to the high cost of annotation, MDER datasets usually exhibit a long-tail distribution, resulting in poor performance of models in recognizing minority class emotions. To address these issues, the paper proposes a new Dialogue and Event Relationship-aware Graph Convolutional Network (DER-GCN) architecture, aiming to improve the performance of multimodal emotion recognition by modeling dialogue relationships and event relationships. The specific objectives include: - **Cross-modal Feature Fusion**: Achieving more accurate emotion recognition by combining information from text, video, and audio modalities. - **Addressing Data Imbalance Problem**: Balancing the proportions of different emotion categories by optimizing the loss function through contrastive learning. - **Learning More Distinctive Class Boundaries**: Enhancing the distinguishability of emotion categories by modeling event relationships. In summary, the goal of the paper is to improve the accuracy and robustness of multimodal dialogue emotion recognition by introducing event relationships and optimizing the model structure.