KoHMT: A Multimodal Emotion Recognition Model Integrating KoELECTRA, HuBERT with Multimodal Transformer

Moung-Ho Yi,Keun-Chang Kwak,Ju-Hyun Shin
DOI: https://doi.org/10.3390/electronics13234674
IF: 2.9
2024-11-28
Electronics
Abstract:With the advancement of human-computer interaction, the role of emotion recognition has become increasingly significant. Emotion recognition technology provides practical benefits across various industries, including user experience enhancement, education, and organizational productivity. For instance, in educational settings, it enables real-time understanding of students' emotional states, facilitating tailored feedback. In workplaces, monitoring employees' emotions can contribute to improved job performance and satisfaction. Recently, emotion recognition has also gained attention in media applications such as automated movie dubbing, where it enhances the naturalness of dubbed performances by synchronizing emotional expression in both audio and visuals. Consequently, multimodal emotion recognition research, which integrates text, speech, and video data, has gained momentum in diverse fields. In this study, we propose an emotion recognition approach that combines text and speech data, specifically incorporating the characteristics of the Korean language. For text data, we utilize KoELECTRA to generate embeddings, and for speech data, we extract features using HuBERT embeddings. The proposed multimodal transformer model processes text and speech data independently, subsequently learning interactions between the two modalities through a Cross-Modal Attention mechanism. This approach effectively combines complementary information from text and speech, enhancing the accuracy of emotion recognition. Our experimental results demonstrate that the proposed model surpasses single-modality models, achieving a high accuracy of 77.01% and an F1-Score of 0.7703 in emotion classification. This study contributes to the advancement of emotion recognition technology by integrating diverse language and modality data, suggesting the potential for further improvements through the inclusion of additional modalities in future work.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?