Abstract:The spatial information of Electroencephalography (EEG) is essential for emotion recognition model to learn discriminative feature. The convolutional networks and recurrent networks are the conventional choices to learn the complex spatial dependencies through a number of electrodes and brain regions. However, these models have difficulty in capturing long-range dependencies due to the operations of local feature learning. To enhance EEG spatial dependencies capturing and improve the accuracy of emotion recognition, we propose a transformer- based model to hierarchically learn the discriminative spatial information from electrode level to brain-region-level. In the electrode-level spatial learning, the transformer encoders are adopted to integrate information within different brain regions. Next, in view of the different roles of brain regions in the emotion recognition, the self-attention within the transformer could emphasize the contributive brain regions. Hence, in the brain-region-level spatial learning, a transformer encoder is utilized to capture the spatial dependencies among the brain regions. Finally, to validate the effectiveness of the proposed model, the subject-independent experiments are conducted on the DEAP and MAHNOB-HCI database. The experimental results demonstrate that the proposed model achieves outstanding performance in emotion recognition with arousal and valence level. Moreover, the visualization of self-attention indicates that the proposed model could emphasize the discriminative spatial information from pre-frontal lobe, frontal lobe, temporal lobe and parietal lobe.

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations

Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation

Emotion recognition in conversations with emotion shift detection based on multi-task learning

Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

Multi-Scale Temporal Transformer For Speech Emotion Recognition

A Simple and Interactive Transformer for Fine-Grained Emotion Detection

CTNet: Conversational Transformer Network for Emotion Recognition

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation

Multilevel Transformer For Multimodal Emotion Recognition

SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation

Dialogue emotion model based on local–global context encoder and commonsense knowledge fusion attention

Transformers for EEG-Based Emotion Recognition: A Hierarchical Spatial Information Learning Model

A Contextual Attention Network for Multimodal Emotion Recognition in Conversation

Speech Emotion Recognition Via CNN-Transformer and Multidimensional Attention Mechanism

Enhancing Emotion Recognition in Conversation Via Multi-view Feature Alignment and Memorization

InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models

Context-Dependent Embedding Utterance Representations for Emotion Recognition in Conversations