First-order Multi-label Learning with Cross-modal Interactions for Multimodal Emotion Recognition

Yunrui Cai,Jingran Xie,Boshi Tang,Yuanyuan Wang,Jun Chen,Haiwei Xue,Zhiyong Wu
DOI: https://doi.org/10.1145/3607865.3613181
2023-01-01
Abstract:Multimodal emotion recognition (MER) is essential for the machine to fully understand human intentions. Various deep neural network based models are proposed but it is still challenging to better model and fuse multimodal features. In addition, recent studies have focused on the classification task of predicting discrete labels, while lacking consideration of the dimension value. In this paper, we propose a multimodal fusion model based on Transformer architecture and cross-modal interactions, and adopt a multi-label learning algorithm of first-order strategy to predict discrete labels and dimension values respectively. We also propose a semi-supervised learning method of moment injection with unlabeled data to enhance the robustness of the model. Finally, we use ensemble learning to further improve the performance of the model. We evaluate the proposed method on the MER-MULTI sub-challenge of Multimodal Emotion Recognition Challenge (MER 2023). Experimental results demonstrate the promising performance of our proposed method, which can achieve the evaluation metric of 0.6765 on the test set.
What problem does this paper attempt to address?