Abstract:Electroencephalogram (EEG), as a tool capable of objectively recording brain electrical signals during emotional expression, has been extensively utilized. Current technology heavily relies on datasets, with its performance being limited by the size of the dataset and the accuracy of its annotations. At the same time, unsupervised learning and contrastive learning methods largely depend on the feature distribution within datasets, thus requiring training tailored to specific datasets for optimal results. However, the collection of EEG signals is influenced by factors such as equipment, settings, individuals, and experimental procedures, resulting in significant variability. Consequently, the effectiveness of models is heavily dependent on dataset collection efforts conducted under stringent objective conditions. To address these challenges, we introduce a novel approach: employing a self‐supervised pre‐training model, to process data across different datasets. This model is capable of operating effectively across multiple datasets. The model conducts self‐supervised pre‐training without the need for direct access to specific emotion category labels, enabling it to pre‐train and extract universally useful features without predefined downstream tasks. To tackle the issue of semantic expression confusion, we employed a masked prediction model that guides the model to generate richer semantic information through learning bidirectional feature combinations in sequence. Addressing challenges such as significant differences in data distribution, we introduced adaptive clustering techniques that manage by generating pseudo‐labels across multiple categories. The model is capable of enhancing the expression of hidden features in intermediate layers during the self‐supervised training process, enabling it to learn common hidden features across different datasets. This study, by constructing a hybrid dataset and conducting extensive experiments, demonstrated two key findings: (1) our model performs best on multiple evaluation metrics; (2) the model can effectively integrate critical features from different datasets, significantly enhancing the accuracy of emotion recognition.

Emotion Recognition With Audio, Video, EEG, and EMG: A Dataset and Baseline Approaches

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Multi-modal emotion analysis from facial expressions and electroencephalogram.

Electroencephalogram Emotion Recognition Based on Empirical Mode Decomposition and Optimal Feature Selection.

Multimodal Emotion Recognition by Combining Physiological Signals and Facial Expressions: a Preliminary Study.

Exploiting EEG signals and audiovisual feature fusion for video emotion recognition

A Multimodal Dataset for Mixed Emotion Recognition

A comparative analysis of machine learning methods for emotion recognition using EEG and peripheral physiological signals

MPED: A Multi-Modal Physiological Emotion Database for Discrete Emotion Recognition

Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review

Emotion recognition framework using multiple modalities for an effective human–computer interaction

Valence-Arousal Model based Emotion Recognition using EEG, peripheral physiological signals and Facial Expression

A Model for EEG-Based Emotion Recognition: CNN-Bi-LSTM with Attention Mechanism

Multimodal emotion recognition based on the fusion of vision, EEG, ECG, and EMG signals

Multimodal Emotion Recognition From EEG Signals and Facial Expressions

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts

Masked self‐supervised pre‐training model for EEG‐based emotion recognition

Multimodal Emotion Recognition Model using Physiological Signals

Multimodal Emotion Recognition Based on EEG and EOG Signals Evoked by the Video-Odor Stimuli

Multimodal Emotion Recognition by Extracting Common and Modality-Specific Information.