HiMul-LGG: A hierarchical decision fusion-based local-global graph neural network for multimodal emotion recognition in conversation

Changzeng Fu,Fengkui Qian,Kaifeng Su,Yikai Su,Ze Wang,Jiaqi Shi,Zhigang Liu,Chaoran Liu,Carlos Toshinori Ishi
DOI: https://doi.org/10.1016/j.neunet.2024.106764
2024-09-28
Abstract:Emotion recognition in conversation (ERC) is a vital task that requires deciphering human emotions through analysis of contextual and multimodal information. However, extant research on ERC concentrates predominantly on investigating multimodal fusion while overlooking the model's constraints in dealing with unimodal representation discrepancy and speaker dependencies. To address the aforementioned problems, this paper proposes a Hierarchical decision fusion-based Local-Global Graph Neural Network for multimodal ERC (HiMul-LGG). HiMul-LGG employs a hierarchical decision fusion strategy to ensure feature alignment across modalities. Moreover, HiMul-LGG also adopts a local-global graph neural network architecture to reinforce inter-modality and intra-modality speaker dependency. Additionally, HiMul-LGG utilizes a cross-modal multi-head attention mechanism to promote interplay between modalities. We evaluate HiMul-LGG on two emotion recognition datasets, IEMOCAP and MELD, where HiMul-LGG outperforms existing methods. The results of the ablation study also imply the effectiveness of the proposed hierarchical decision fusion strategy and local-global structure of Graph construction.
What problem does this paper attempt to address?