Unlocking the Power of Multimodal Learning for Emotion Recognition in Conversation

Yunxiao Wang,Meng Liu,Zhe Li,Yupeng Hu,Xin Luo,Liqiang Nie
DOI: https://doi.org/10.1145/3581783.3613846
2023-01-01
Abstract:Emotion recognition in conversation aims to identify the emotions underlying each utterance, and it has great potential in various domains. Human perception of emotions relies on multiple modalities, such as language, vocal tonality, and facial expressions. While many studies have incorporated multimodal information to enhance emotion recognition, the performance of multimodal models often plateaus when additional modalities are added. We demonstrate through experiments that the main reason for this plateau is an imbalanced assignment of gradients across modalities. To address this issue, we propose fine-grained adaptive gradient modulation, a plug-in approach to rebalance the gradients of modalities. Experimental results show that our method improves the performance of all baseline models and outperforms existing plug-in methods.
What problem does this paper attempt to address?