Abstract:Multimodal emotion recognition has the potential to impact various fields, including human-computer interaction, virtual reality, and emotional intelligence systems. This study introduces a comprehensive framework that enhances the accuracy and computational efficiency of emotion recognition by leveraging knowledge distillation and transfer learning, incorporating both unimodal and multimodal models. The framework also combines subject-specific and subject-independent models, achieving a balance between localization and generalization. Subject-independent models include EEG-based, non-EEG-based (i.e., electromyography, electrooculography, electrodermal activity, galvanic skin response, skin temperature, respiration, blood volume pulse, heart rate, and eye movements), and multimodal models trained on all training subjects, capturing a broader context. Subject-specific models, including EEG-based, non-EEG-based, and multimodal models, are trained on individual subjects to provide localized knowledge. The proposed framework then distills knowledge from these teacher models into a student model, utilizing six different distillation losses to combine both subject-independent and subject-specific insights. This approach makes the model subject-aware by using local patterns and modality-aware by incorporating unimodal data, enhancing the robustness and generalizability of emotion recognition systems to varied real-world scenarios. The framework was tested on two well-known datasets, SEED-V and DEAP, as well as an immersive three-Dimensional (3D) Virtual Reality (VR) dataset, GraffitiVR, which captures emotional and behavioral responses from individuals experiencing urban graffiti in a VR environment. This broader application provides insights into the effectiveness of emotion recognition models in both 2D and 3D settings, facilitating a wider range of assessment. Empirical results demonstrate that the proposed knowledge distillation-based model significantly elevates performance across all datasets when compared to traditional models. Specifically, the model demonstrated improvements ranging from 6.56% to 24.59% over unimodal models and from 1.56% to 4.11% over multimodal approaches across the SEED-V, DEAP, and GraffitiVR datasets. These results underscore the robustness and effectiveness of the proposed approach, suggesting that it significantly enhances emotion recognition processes across various environmental settings.

A Method of Multimodal Emotion Recognition in Video Learning Based on Knowledge Enhancement

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios

A Multimodal Intelligent Emotion Perception Framework by Data-driven and Knowledge-guided

Multimodal Emotional Classification Based on Meaningful Learning

Multimodal Emotion Recognition by Combining Physiological Signals and Facial Expressions: a Preliminary Study.

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation

Multimodal interaction enhanced representation learning for video emotion recognition

Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video

Multimodal Emotion Recognition by Extracting Common and Modality-Specific Information.

Multimodal Emotion Recognition and State Analysis of Classroom Video and Audio Based on Deep Neural Network

Multimodal Emotion Recognition Model using Physiological Signals

Multimodal Emotion Recognition based on the Fusion of EEG Signals and Eye Movement Data

Modality- and Subject-Aware Emotion Recognition Using Knowledge Distillation

Multimodal Emotion Recognition From EEG Signals and Facial Expressions

Defective forebrain development in mice lacking gp330/megalin.

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Multimodal Latent Emotion Recognition from Micro-expression and Physiological Signals

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition

Optimized Piano Music Education Model Based on Multimodal Information Fusion for Emotion Recognition in Multimedia Video Networks

A multimodal emotion recognition model integrating speech, video and MoCAP