Video Multimodal Emotion Recognition System for Real World Applications

Sun-Kyung Lee,Jong-Hwan Kim
DOI: https://doi.org/10.48550/arXiv.2308.14320
IF: 6.4588
2023-08-28
Human-Computer Interaction
Abstract:This paper proposes a system capable of recognizing a speaker's utterance-level emotion through multimodal cues in a video. The system seamlessly integrates multiple AI models to first extract and pre-process multimodal information from the raw video input. Next, an end-to-end MER model sequentially predicts the speaker's emotions at the utterance level. Additionally, users can interactively demonstrate the system through the implemented interface.
What problem does this paper attempt to address?