MLGAT: multi-layer graph attention networks for multimodal emotion recognition in conversations
Jun Wu,Junwei Wu,Yu Zheng,Pengfei Zhan,Min Han,Gan Zuo,Li Yang
DOI: https://doi.org/10.1007/s10844-024-00879-4
2024-10-06
Journal of Intelligent Information Systems
Abstract:With the rise of digital interactions, multimodal emotion recognition has gained significant research interest. In conversations, people use multiple modalities like text, voice, and images to convey emotions.However, effectively integrating and utilizing these different modalities remains challenging. We propose a novel multimodal emotion recognition model named Multi-Layer Graph Attention Network (MLGAT) to address this. The model constructs a graph structure for each modality (text, speech, and image) and introduces a multi-layer graph attention mechanism to capture relationships between nodes within each modality effectively.The MLGAT network model fuses graph structures from multiple modalities into a unified multimodal graph, allowing joint learning and feature fusion. During training, multimodal sentiment labels supervise the network, enabling the model to learn effective sentiment representations. Experimental results show that the MLGAT model significantly improves the accuracy and robustness of multimodal emotion recognition compared to traditional models.
computer science, information systems, artificial intelligence