Abstract:Multimodal emotion recognition has the potential to impact various fields, including human-computer interaction, virtual reality, and emotional intelligence systems. This study introduces a comprehensive framework that enhances the accuracy and computational efficiency of emotion recognition by leveraging knowledge distillation and transfer learning, incorporating both unimodal and multimodal models. The framework also combines subject-specific and subject-independent models, achieving a balance between localization and generalization. Subject-independent models include EEG-based, non-EEG-based (i.e., electromyography, electrooculography, electrodermal activity, galvanic skin response, skin temperature, respiration, blood volume pulse, heart rate, and eye movements), and multimodal models trained on all training subjects, capturing a broader context. Subject-specific models, including EEG-based, non-EEG-based, and multimodal models, are trained on individual subjects to provide localized knowledge. The proposed framework then distills knowledge from these teacher models into a student model, utilizing six different distillation losses to combine both subject-independent and subject-specific insights. This approach makes the model subject-aware by using local patterns and modality-aware by incorporating unimodal data, enhancing the robustness and generalizability of emotion recognition systems to varied real-world scenarios. The framework was tested on two well-known datasets, SEED-V and DEAP, as well as an immersive three-Dimensional (3D) Virtual Reality (VR) dataset, GraffitiVR, which captures emotional and behavioral responses from individuals experiencing urban graffiti in a VR environment. This broader application provides insights into the effectiveness of emotion recognition models in both 2D and 3D settings, facilitating a wider range of assessment. Empirical results demonstrate that the proposed knowledge distillation-based model significantly elevates performance across all datasets when compared to traditional models. Specifically, the model demonstrated improvements ranging from 6.56% to 24.59% over unimodal models and from 1.56% to 4.11% over multimodal approaches across the SEED-V, DEAP, and GraffitiVR datasets. These results underscore the robustness and effectiveness of the proposed approach, suggesting that it significantly enhances emotion recognition processes across various environmental settings.

Phytochemical components and biological activities of Silene arenarioides Desf.

JDAT: Joint-Dimension-Aware Transformer with Strong Flexibility for EEG Emotion Recognition

Multimodal Emotional Classification Based on Meaningful Learning

Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health

3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection

DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations

A Lightweight Domain Adversarial Neural Network Based on Knowledge Distillation for EEG-based Cross-subject Emotion Recognition

MSLTE: multiple self-supervised learning tasks for enhancing EEG emotion recognition

A Method of Multimodal Emotion Recognition in Video Learning Based on Knowledge Enhancement

Knowledge distillation based lightweight domain adversarial neural network for electroencephalogram-based emotion recognition

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition

Modality- and Subject-Aware Emotion Recognition Using Knowledge Distillation

Multimodal Emotion Recognition From EEG Signals and Facial Expressions

Decoupled Multimodal Distilling for Emotion Recognition

MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation

Emotion recognition based on multi-modal electrophysiology multi-head attention Contrastive Learning

Enhanced multimodal emotion recognition in healthcare analytics: A deep learning based model-level fusion approach

Multimodal Emotion Recognition Model using Physiological Signals

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG