Abstract:Electroencephalogram (EEG), as a tool capable of objectively recording brain electrical signals during emotional expression, has been extensively utilized. Current technology heavily relies on datasets, with its performance being limited by the size of the dataset and the accuracy of its annotations. At the same time, unsupervised learning and contrastive learning methods largely depend on the feature distribution within datasets, thus requiring training tailored to specific datasets for optimal results. However, the collection of EEG signals is influenced by factors such as equipment, settings, individuals, and experimental procedures, resulting in significant variability. Consequently, the effectiveness of models is heavily dependent on dataset collection efforts conducted under stringent objective conditions. To address these challenges, we introduce a novel approach: employing a self‐supervised pre‐training model, to process data across different datasets. This model is capable of operating effectively across multiple datasets. The model conducts self‐supervised pre‐training without the need for direct access to specific emotion category labels, enabling it to pre‐train and extract universally useful features without predefined downstream tasks. To tackle the issue of semantic expression confusion, we employed a masked prediction model that guides the model to generate richer semantic information through learning bidirectional feature combinations in sequence. Addressing challenges such as significant differences in data distribution, we introduced adaptive clustering techniques that manage by generating pseudo‐labels across multiple categories. The model is capable of enhancing the expression of hidden features in intermediate layers during the self‐supervised training process, enabling it to learn common hidden features across different datasets. This study, by constructing a hybrid dataset and conducting extensive experiments, demonstrated two key findings: (1) our model performs best on multiple evaluation metrics; (2) the model can effectively integrate critical features from different datasets, significantly enhancing the accuracy of emotion recognition.

Multimodal Multi-View Spectral-Spatial-Temporal Masked Autoencoder for Self-Supervised Emotion Recognition

A Multi-view Spectral-Spatial-Temporal Masked Autoencoder for Decoding Emotions with Self-supervised Learning

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Multi-modal emotion analysis from facial expressions and electroencephalogram.

Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition

STM-Net Based Spatial-Temporal Multi-Modal Fusion Network for Emotion Recognition

A Cross-Modal Adaptive Masked Autoencoder for Decoding Emotions with Multimodal Data

MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

Emotion Recognition Using Cross-Modal Attention from Eeg and Facial Expression

MSLTE: multiple self-supervised learning tasks for enhancing EEG emotion recognition

Multimodal Emotion Recognition From EEG Signals and Facial Expressions

Multimodal emotion recognition model via hybrid model with improved feature level fusion on facial and EEG feature set

Multimodal Fused Emotion Recognition about Expression-EEG Interaction and Collaboration Using Deep Learning

Masked self‐supervised pre‐training model for EEG‐based emotion recognition

Multimodal Emotion Recognition based on the Fusion of EEG Signals and Eye Movement Data

Multimodal Emotion Recognition Based on Feature Selection and Extreme Learning Machine in Video Clips.

A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals

Multimodal Adaptive Emotion Transformer with Flexible Modality Inputs on A Novel Dataset with Continuous Labels

Emotion recognition based on multi-modal electrophysiology multi-head attention Contrastive Learning

An autoencoder-based feature level fusion for speech emotion recognition

GMSS: Graph-Based Multi-Task Self-Supervised Learning for EEG Emotion Recognition