Abstract:The integration of human emotions into multimedia applications shows great potential for enriching user experiences and enhancing engagement across various digital platforms. Unlike traditional methods such as questionnaires, facial expressions, and voice analysis, brain signals offer a more direct and objective understanding of emotional states. However, in the field of electroencephalography (EEG)-based emotion recognition, previous studies have primarily concentrated on training and testing EEG models within a single dataset, overlooking the variability across different datasets. This oversight leads to significant performance degradation when applying EEG models to cross-corpus scenarios. In this study, we propose a novel Joint Contrastive learning framework with Feature Alignment (JCFA) to address cross-corpus EEG-based emotion recognition. The JCFA model operates in two main stages. In the pre-training stage, a joint domain contrastive learning strategy is introduced to characterize generalizable time-frequency representations of EEG signals, without the use of labeled data. It extracts robust time-based and frequency-based embeddings for each EEG sample, and then aligns them within a shared latent time-frequency space. In the fine-tuning stage, JCFA is refined in conjunction with downstream tasks, where the structural connections among brain electrodes are considered. The model capability could be further enhanced for the application in emotion detection and interpretation. Extensive experimental results on two well-recognized emotional datasets show that the proposed JCFA model achieves state-of-the-art (SOTA) performance, outperforming the second-best method by an average accuracy increase of 4.09% in cross-corpus EEG-based emotion recognition tasks.

Multi-level Feature Joint Learning Methods for Emotional Speaker Recognition

Emotional Speech Clustering Based Robust Speaker Recognition System

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition

Applying Emotional Factor Analysis And I-Vector To Emotional Speaker Recognition

Deep Spectrum Feature Representations for Speech Emotion Recognition

Emotional speaker recognition based on similar neighbor phenomenon

Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition.

Emotional speaker recognition based on i-vector through Atom Aligned Sparse Representation

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Speaker-Independent Speech Emotion Recognition Based On Cnn-Blstm And Multiple Svms

Speech Emotion Recognition Based on Linear Discriminant Analysis and Support Vector Machine Decision Tree

Mismatched Feature Detection with Finer Granularity for Emotional Speaker Recognition.

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

Multi-level attention fusion network assisted by relative entropy alignment for multimodal speech emotion recognition

Emotion embedding framework with emotional self-attention mechanism for speaker recognition

Cross-corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition

Joint Contrastive Learning with Feature Alignment for Cross-Corpus EEG-based Emotion Recognition

SEC-GAN for robust speaker recognition with emotional state dismatch