Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition

Junjie Chen,Jianhua Tao,Yongwei Li,Zhengqi Wen,Ziping Zhao,Xuefei Liu
DOI: https://doi.org/10.1109/APSIPAASC58517.2023.10317160
2023-10-31
Abstract:Multimodal emotion recognition plays a pivotal role in the advancement of natural human-computer interaction systems. Recent studies have attempted to apply multi-task learning to emotion recognition. However, the multi-task shared feature extractor of traditional methods needs to integrate the feature representations of different tasks, which may lead to the feature extractor failing to focus on the learning of emotion representations. To address this problem, we propose a hybrid multi-task learning framework for end-to-end multimodal emotion recognition, in which the primary task is emotion classification, and the auxiliary tasks are emotion regression and gender classification. This framework consists of two networks specialized in gender and emotion recognition, where the latter transfers knowledge from the former through our proposed Deep Aggregation LSTM (DA-LSTM). The DA-LSTM could more precisely capture emotional information in discourse by aggregating emotion and gender feature extractors. Experimental results on a commonly used dataset IEMOCAP demonstrate the effectiveness of our proposed method.
Computer Science
What problem does this paper attempt to address?