MRFER: Multi-Channel Robust Feature Enhanced Fusion for Multi-Modal Emotion Recognition

Xiao Fu,Wei Xi,Zhao Yang,Rui Jiang,Dianwen Ng,Jie Yang,Jizhong Zhao
DOI: https://doi.org/10.1109/icme57554.2024.10688192
2024-01-01
Abstract:In multi-modal emotion recognition, previous studies focus on obtaining more distinguishable unimodal features and expanding complementary information across modalities. However, a considerable amount of latent emotional information is neglected. It leads to insufficient intra-modal representations and a one-sided perspective on inter-modal relationship learning. To address these challenges, we propose a novel framework named MRFER, which explores strategies to reduce the loss of emotional information. It models robust unimodal features through a multi-path feature extractor and captures more comprehensive inter-modal relationships through a text-guided dual attention fusion module. Systematic evaluation covers generalization and overall performance, showcasing MRFER’s advancement beyond existing state-of-the-art approaches.
What problem does this paper attempt to address?