HAAN-ERC: Hierarchical Adaptive Attention Network for Multimodal Emotion Recognition in Conversation

Tao Zhang,Zhenhua Tan,Xiaoer Wu
DOI: https://doi.org/10.1007/s00521-023-08638-2
2023-01-01
Neural Computing and Applications
Abstract:Multimodal emotional expressions affect the progress of conversation in complex ways in our lives. For multimodal emotion recognition in conversation (ERC), previous studies focus on modeling partial influences of speaker and modality to infer emotion states in historical context based on traditional modeling units. However, with the tremendous success of Transformer in broad fields, how to effectively model intra- and inter-speaker, intra- and intermodal influences in historical dialog context based on Transformer is still not been tackled. In this paper, we propose a novel methodology HAAN-ERC, which hierarchically uses dialogue context information to model intra-speaker, inter-speaker, intra-modal, and intermodal influences to infer the emotional state of speakers. Meanwhile, we propose an adaptive attention mechanism, which can be trained in an end-to-end manner and automatically makes the unique decision for each speaker to omit redundant or valueless utterances from historical contexts in multiple hierarchies for adaptive fusion. The performance of HAAN-ERC is comprehensively evaluated on two popular multimodal ERC datasets of IEMOCAP and MELD, and achieves new state-of-the-art results. The encouraging results prove the validity of our HAAN-ERC. Our original codes will be publicly available at https://github.com/TAN-OpenLab/HAAN-ERC .
What problem does this paper attempt to address?