Abstract:Electroencephalogram (EEG), as a tool capable of objectively recording brain electrical signals during emotional expression, has been extensively utilized. Current technology heavily relies on datasets, with its performance being limited by the size of the dataset and the accuracy of its annotations. At the same time, unsupervised learning and contrastive learning methods largely depend on the feature distribution within datasets, thus requiring training tailored to specific datasets for optimal results. However, the collection of EEG signals is influenced by factors such as equipment, settings, individuals, and experimental procedures, resulting in significant variability. Consequently, the effectiveness of models is heavily dependent on dataset collection efforts conducted under stringent objective conditions. To address these challenges, we introduce a novel approach: employing a self‐supervised pre‐training model, to process data across different datasets. This model is capable of operating effectively across multiple datasets. The model conducts self‐supervised pre‐training without the need for direct access to specific emotion category labels, enabling it to pre‐train and extract universally useful features without predefined downstream tasks. To tackle the issue of semantic expression confusion, we employed a masked prediction model that guides the model to generate richer semantic information through learning bidirectional feature combinations in sequence. Addressing challenges such as significant differences in data distribution, we introduced adaptive clustering techniques that manage by generating pseudo‐labels across multiple categories. The model is capable of enhancing the expression of hidden features in intermediate layers during the self‐supervised training process, enabling it to learn common hidden features across different datasets. This study, by constructing a hybrid dataset and conducting extensive experiments, demonstrated two key findings: (1) our model performs best on multiple evaluation metrics; (2) the model can effectively integrate critical features from different datasets, significantly enhancing the accuracy of emotion recognition.

A Multi-view Spectral-Spatial-Temporal Masked Autoencoder for Decoding Emotions with Self-supervised Learning

Multimodal Multi-View Spectral-Spatial-Temporal Masked Autoencoder for Self-Supervised Emotion Recognition

Multi-view Domain-Adaptive Representation Learning for EEG-based Emotion Recognition

Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition

A Cross-Modal Adaptive Masked Autoencoder for Decoding Emotions with Multimodal Data

MSLTE: multiple self-supervised learning tasks for enhancing EEG emotion recognition

Masked self‐supervised pre‐training model for EEG‐based emotion recognition

MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

Multi-Modal Domain Adaptation Variational Autoencoder for EEG-Based Emotion Recognition

GMAEEG: A Self-Supervised Graph Masked Autoencoder for EEG Representation Learning

Latent Factor Decoding of Multi-Channel EEG for Emotion Recognition Through Autoencoder-Like Neural Networks

HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

Deep Sparse Autoencoder and Recursive Neural Network for EEG Emotion Recognition

EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition

Boosting Continuous Emotion Recognition with Self-Pretraining using Masked Autoencoders, Temporal Convolutional Networks, and Transformers

A Bi-Stream hybrid model with MLPBlocks and self-attention mechanism for EEG-based emotion recognition

Multi-View Multi-Label Fine-Grained Emotion Decoding From Human Brain Activity

A Model for EEG-Based Emotion Recognition: CNN-Bi-LSTM with Attention Mechanism

Multi-modal Facial Affective Analysis based on Masked Autoencoder

Variational Autoencoder Based Latent Factor Decoding of Multichannel EEG for Emotion Recognition