Abstract:This paper presents a novel approach for emotion recognition (ER) based on Electroencephalogram (EEG), Electromyogram (EMG), Electrocardiogram (ECG), and computer vision. The proposed system includes two different models for physiological signals and facial expressions deployed in a real-time embedded system. A custom dataset for EEG, ECG, EMG, and facial expression was collected from 10 participants using an Affective Video Response System. Time, frequency, and wavelet domain-specific features were extracted and optimized, based on their Visualizations from Exploratory Data Analysis (EDA) and Principal Component Analysis (PCA). Local Binary Patterns (LBP), Local Ternary Patterns (LTP), Histogram of Oriented Gradients (HOG), and Gabor descriptors were used for differentiating facial emotions. Classification models, namely decision tree, random forest, and optimized variants thereof, were trained using these features. The optimized Random Forest model achieved an accuracy of 84%, while the optimized Decision Tree achieved 76% for the physiological signal-based model. The facial emotion recognition (FER) model attained an accuracy of 84.6%, 74.3%, 67%, and 64.5% using K-Nearest Neighbors (KNN), Random Forest, Decision Tree, and XGBoost, respectively. Performance metrics, including Area Under Curve (AUC), F1 score, and Receiver Operating Characteristic Curve (ROC), were computed to evaluate the models. The outcome of both results, i.e., the fusion of bio-signals and facial emotion analysis, is given to a voting classifier to get the final emotion. A comprehensive report is generated using the Generative Pretrained Transformer (GPT) language model based on the resultant emotion, achieving an accuracy of 87.5%. The model was implemented and deployed on a Jetson Nano. The results show its relevance to ER. It has applications in enhancing prosthetic systems and other medical fields such as psychological therapy, rehabilitation, assisting individuals with neurological disorders, mental health monitoring, and biometric security.

A Unified Biosensor–Vision Multi-Modal Transformer network for emotion recognition

A Unified Transformer-based Network for multimodal Emotion Recognition

Multimodal Neurophysiological Transformer for Emotion Recognition

Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals

Transformer-Based Self-Supervised Learning for Emotion Recognition

Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals

Transformer-Based Multimodal Emotional Perception for Dynamic Facial Expression Recognition in the Wild

EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

An End-to-End Transformer with Progressive Tri-Modal Attention for Multi-modal Emotion Recognition.

Multimodal emotion recognition based on the fusion of vision, EEG, ECG, and EMG signals

Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers

Transformer Based Multimodal Speech Emotion Recognition with Improved Neural Networks

Emotion Recognition Using Transformers with Masked Learning

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

End-to-End Multimodal Emotion Recognition Based on Facial Expressions and Remote Photoplethysmography Signals

Temporal aware Mixed Attention-based Convolution and Transformer Network for cross-subject EEG emotion recognition

Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention

Multi-Task Transformer with uncertainty modelling for Face Based Affective Computing

Multimodal Adaptive Emotion Transformer with Flexible Modality Inputs on A Novel Dataset with Continuous Labels

Multilevel Transformer For Multimodal Emotion Recognition

Multimodal Transformer Fusion for Emotion Recognition: A Survey