Abstract:Emotion recognition plays an essential role in human–computer interaction. Previous studies have investigated the use of facial expression and electroencephalogram (EEG) signals from single modal for emotion recognition separately, but few have paid attention to a fusion between them. In this paper, we adopted a multimodal emotion recognition framework by combining facial expression and EEG, based on a valence-arousal emotional model. For facial expression detection, we followed a transfer learning approach for multi-task convolutional neural network (CNN) architectures to detect the state of valence and arousal. For EEG detection, two learning targets (valence and arousal) were detected by different support vector machine (SVM) classifiers, separately. Finally, two decision-level fusion methods based on the enumerate weight rule or an adaptive boosting technique were used to combine facial expression and EEG. In the experiment, the subjects were instructed to watch clips designed to elicit an emotional response and then reported their emotional state. We used two emotion datasets—a Database for Emotion Analysis using Physiological Signals (DEAP) and MAHNOB-human computer interface (MAHNOB-HCI)—to evaluate our method. In addition, we also performed an online experiment to make our method more robust. We experimentally demonstrated that our method produces state-of-the-art results in terms of binary valence/arousal classification, based on DEAP and MAHNOB-HCI data sets. Besides this, for the online experiment, we achieved 69.75% accuracy for the valence space and 70.00% accuracy for the arousal space after fusion, each of which has surpassed the highest performing single modality (69.28% for the valence space and 64.00% for the arousal space). The results suggest that the combination of facial expressions and EEG information for emotion recognition compensates for their defects as single information sources. The novelty of this work is as follows. To begin with, we combined facial expression and EEG to improve the performance of emotion recognition. Furthermore, we used transfer learning techniques to tackle the problem of lacking data and achieve higher accuracy for facial expression. Finally, in addition to implementing the widely used fusion method based on enumerating different weights between two models, we also explored a novel fusion method, applying boosting technique.

Research on Multimodal Emotion Recognition Based on Fusion of Electroencephalogram and Electrooculography

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Multimodal Emotion Recognition Based on EEG and EOG Signals Evoked by the Video-Odor Stimuli

MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

Combining Facial Expressions and Electroencephalography to Enhance Emotion Recognition

Multimodal Emotion Recognition From EEG Signals and Facial Expressions

TMFER: Multimodal Fusion Emotion Recognition Algorithm Based on Transformer

Fusion of Facial Expressions and EEG for Multimodal Emotion Recognition

Multimodal Emotion Recognition based on the Fusion of EEG Signals and Eye Movement Data

Multimodal Emotion Recognition Based on Feature Selection and Extreme Learning Machine in Video Clips.

Emotion Recognition with Multimodal Transformer Fusion Framework Based on Acoustic and Lexical Information

Emotion Recognition Using Cross-Modal Attention from Eeg and Facial Expression

Multimodal transformer augmented fusion for speech emotion recognition

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Multimodal Transformer Fusion for Continuous Emotion Recognition

Context-aware Multimodal Fusion for Emotion Recognition

Emotion Recognition from Multiple Physiological Signals Using Intra- and Inter-Modality Attention Fusion Network

An End-to-End Transformer with Progressive Tri-Modal Attention for Multi-modal Emotion Recognition.

A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals

Multimodal emotion recognition using EEG and eye tracking data.

Transformer-Based Multimodal Emotional Perception for Dynamic Facial Expression Recognition in the Wild