Abstract:Automatic Emotion Recognition (AER) is critical for naturalistic Human-Machine Interactions (HMI). Emotions can be detected through both external behaviors, e.g., tone of voice and internal physiological signals, e.g., electroencephalogram (EEG). In this paper, we first constructed a multi-modal emotion database, named Multi-modal Emotion Database with four modalities (MED4). MED4 consists of synchronously recorded signals of participants' EEG, photoplethysmography, speech and facial images when they were influenced by video stimuli designed to induce happy, sad, angry and neutral emotions. The experiment was performed with 32 participants in two environment conditions, a research lab with natural noises and an anechoic chamber. Four baseline algorithms were developed to verify the database and the performances of AER methods, Identification-vector + Probabilistic Linear Discriminant Analysis (I-vector + PLDA), Temporal Convolutional Network (TCN), Extreme Learning Machine (ELM) and Multi-Layer Perception Network (MLP). Furthermore, two fusion strategies on feature-level and decision-level respectively were designed to utilize both external and internal information of human status. The results showed that EEG signals generate higher accuracy in emotion recognition than that of speech signals (achieving 88.92% in anechoic room and 89.70% in natural noisy room vs 64.67% and 58.92% respectively). Fusion strategies that combine speech and EEG signals can improve overall accuracy of emotion recognition by 25.92% when compared to speech and 1.67% when compared to EEG in anechoic room and 31.74% and 0.96% in natural noisy room. Fusion methods also enhance the robustness of AER in the noisy environment. The MED4 database will be made publicly available, in order to encourage researchers all over the world to develop and validate various advanced methods for AER.

EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction

Emotion Recognition With Audio, Video, EEG, and EMG: A Dataset and Baseline Approaches

Exploiting EEG signals and audiovisual feature fusion for video emotion recognition

Electroencephalogram Emotion Recognition Based on Empirical Mode Decomposition and Optimal Feature Selection.

Multi-modal emotion analysis from facial expressions and electroencephalogram.

Multi-modal emotion recognition using EEG and speech signals

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

MPED: A Multi-Modal Physiological Emotion Database for Discrete Emotion Recognition

Multi-view Domain-Adaptive Representation Learning for EEG-based Emotion Recognition

An EEG-Based Multi-Modal Emotion Database with Both Posed and Authentic Facial Actions for Emotion Analysis

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

A Large Finer-grained Affective Computing EEG Dataset

Context Based Emotion Recognition Using EMOTIC Dataset

MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation

Multimodal Emotion Recognition Based on EEG and EOG Signals Evoked by the Video-Odor Stimuli

Valence-Arousal Model based Emotion Recognition using EEG, peripheral physiological signals and Facial Expression

Emotion Recognition in Conversations Using Brain and Physiological Signals

eSEE-d: Emotional State Estimation Based on Eye-Tracking Dataset

Emotion Recognition With Temporarily Localized 'Emotional Events' in Naturalistic Context