A multimodal emotion recognition model integrating speech, video and MoCAP
Ning Jia,Chunjun Zheng,Wei Sun
DOI: https://doi.org/10.1007/s11042-022-13091-9
IF: 2.577
2022-04-13
Multimedia Tools and Applications
Abstract:As one of the core technologies in the field of human-computer interaction, emotion recognition focuses on the simulation of human emotion perception and understanding process. Emotion recognition is widely used in medical, education, life, transportation and other fields. At present, the emotion recognition is still a challenging topic. The accuracy of emotion recognition in multimodal is discussed, different emotion features are extracted from speech, video and motion capture (MoCAP) by using deep learning methods, and a matching emotion recognition model called facial motion speech emotion recognition (FM-SER) model is designed. Local and global information of speech, dual spectrograms are designed in audio mode to choose the time-domain and frequency-domain information, and convolutional neural networks (CNN), gated recurrent unit (GRU) and attention models are used to realize speech emotion recognition. A 3D CNN model based on attention mechanism is used in the video mode to capture the potential emotional expression. The sequential features of hand and head movements are extracted from MoCAP, and import into a bidirectional three-layer long short-term memory (LSTM) model with the attention mechanism. Based on the complementary relationship between multimodal, the decision level integrating scheme is designed with higher-precision, stronger generalization ability of emotion recognition. Through a lot of experiments, we compared the results of several popular emotion recognition models on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The results showed that the proposed method had higher recognition accuracies in single modality and multimodal, and the average accuracies of one modality and multimodal were improved by 16.3% and 9%. The effectiveness of FM-SER model in emotion recognition was proved.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering