Abstract:Goal: As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. Method: In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions, speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch, speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. Result and Conclusions: Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.

Video Multimodal Emotion Recognition System for Real World Applications

Emotion Recognition in Videos via Fusing Multimodal Features.

Multimodal emotion recognition from audio and video

Multimodal interaction enhanced representation learning for video emotion recognition

A robust multimodal approach for emotion recognition

A multimodal emotion recognition model integrating speech, video and MoCAP

FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference

Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational Models

Multimodal modelling of human emotion using sound, image and text fusion

Multimodal Speech Emotion Recognition Using Audio and Text

Multimodal Emotion Recognition by Combining Physiological Signals and Facial Expressions: a Preliminary Study.

Multimodal Emotion Recognition by Extracting Common and Modality-Specific Information.

Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment

Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Multimodal emotion recognition model via hybrid model with improved feature level fusion on facial and EEG feature set

A Multimodal Emotion Sensing Platform for Building Emotion-Aware Applications

Multimodal Speech Emotion Recognition Using Modality-specific Self-Supervised Frameworks

Emotion Recognition Model Based on Multimodal Decision Fusion

Multimodal Daily-Life Emotional Recognition Using Heart Rate and Speech Data From Wearables