Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

Cam-Van Thi Nguyen,Cao-Bach Nguyen,Quang-Thuy Ha,Duc-Trong Le
2024-03-08
Abstract:Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance. Curriculum learning facilitates the learning process by gradually presenting training samples in a meaningful order, thereby improving the model's performance in handling emotional variations and data imbalance. Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models. We release the code for MultiDAG+CL and experiments:
Machine Learning
What problem does this paper attempt to address?
The paper aims to address the problem of Multimodal Emotion Recognition in Conversation (ERC). Specifically, the authors propose a new method called MultiDAG+CL, which combines Directed Acyclic Graph (DAG) and Curriculum Learning (CL) techniques. By integrating text, audio, and visual features through DAG-GNN and processing them within a unified framework, the method achieves a more comprehensive representation of emotional expressions. Additionally, by introducing a curriculum learning strategy to tackle the challenges posed by emotional transitions and data imbalance, the model's performance in handling emotional changes and data imbalance is improved. Experimental results show that MultiDAG+CL outperforms existing baseline models on the IEMOCAP and MELD datasets.