Abstract:Goal: As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. Method: In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions, speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch, speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. Result and Conclusions: Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.

Music emotion recognition using deep convolutional neural networks

CNN Based Music Emotion Classification

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Modularized Composite Attention Network for Continuous Music Emotion Recognition

Dimensional Music Emotion Recognition by Machine Learning

A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition

Music Emotions Recognition by Machine Learning with Cognitive Classification Methodologies

Multi-Dimensional Music Emotion Recognition Incorporating Convolutional Neural Networks and Plutchik's Emotion Wheel.

Bidirectional Convolutional Recurrent Sparse Network (BCRSN): an Efficient Model for Music Emotion Recognition

Music emotion recognition based on temporal convolutional attention network using EEG

Music Emotion Recognition Based on a Neural Network with an Inception-GRU Residual Structure

Acoustic emotion recognition using deep neural network

Learning Music Emotions via Quantum Convolutional Neural Network.

Frequency Embedded Regularization Network for Continuous Music Emotion Recognition

Music Emotion Recognition Through Sparse Canonical Correlation Analysis

Music Emotion Prediction Using Recurrent Neural Networks

Recognition of Music Emotion Based on Forward Neural Network

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Recurrent Neural Network for MIDI Music Emotion Classification

Dynamic Music Emotion Recognition Based on CNN-BiLSTM

Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition.