Abstract:Automatic Emotion Recognition (AER) is critical for naturalistic Human-Machine Interactions (HMI). Emotions can be detected through both external behaviors, e.g., tone of voice and internal physiological signals, e.g., electroencephalogram (EEG). In this paper, we first constructed a multi-modal emotion database, named Multi-modal Emotion Database with four modalities (MED4). MED4 consists of synchronously recorded signals of participants' EEG, photoplethysmography, speech and facial images when they were influenced by video stimuli designed to induce happy, sad, angry and neutral emotions. The experiment was performed with 32 participants in two environment conditions, a research lab with natural noises and an anechoic chamber. Four baseline algorithms were developed to verify the database and the performances of AER methods, Identification-vector + Probabilistic Linear Discriminant Analysis (I-vector + PLDA), Temporal Convolutional Network (TCN), Extreme Learning Machine (ELM) and Multi-Layer Perception Network (MLP). Furthermore, two fusion strategies on feature-level and decision-level respectively were designed to utilize both external and internal information of human status. The results showed that EEG signals generate higher accuracy in emotion recognition than that of speech signals (achieving 88.92% in anechoic room and 89.70% in natural noisy room vs 64.67% and 58.92% respectively). Fusion strategies that combine speech and EEG signals can improve overall accuracy of emotion recognition by 25.92% when compared to speech and 1.67% when compared to EEG in anechoic room and 31.74% and 0.96% in natural noisy room. Fusion methods also enhance the robustness of AER in the noisy environment. The MED4 database will be made publicly available, in order to encourage researchers all over the world to develop and validate various advanced methods for AER.

Automatic Emotion Variation Detection in Continuous Speech.

Automatic Emotion Variation Detection Using Multi-Scaled Sliding Window

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition

MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition

Shift Window Based Framework for Emotional Change Detection of Speech

Speaker-Independent Speech Emotion Recognition Based On Cnn-Blstm And Multiple Svms

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Speech Emotion Recognition Based on Feature Selection and Extreme Learning Machine Decision Tree

Multi-modal emotion recognition using EEG and speech signals

Speech Emotion Recognition Based on Linear Discriminant Analysis and Support Vector Machine Decision Tree

Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition

Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings

Emotion Detection from Speech to Enrich Multimedia Content

Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection

An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance

Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

Speech Emotion Classification Using Attention-Based LSTM

Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM