Driver Multi-task Emotion Recognition Network Based on Multi-modal Facial Video Analysis

Guoliang Xiang,Song Yao,Xianhui Wu,Hanwen Deng,Guojie Wang,Yu Liu,Fan Li,Yong Peng
DOI: https://doi.org/10.1016/j.patcog.2024.111241
IF: 8
2024-12-04
Pattern Recognition
Abstract:Driver emotion recognition is crucial for enhancing the safety and user experience in driving scenarios. However, current emotion recognition methods often rely solely on a single modality and a single-task setup, leading to suboptimal performance in driving scenarios. To address this, this paper proposes a driver multitask emotion recognition method based on multimodal facial video analysis (MER-MFVA). This method extracts facial expression features and remote photoplethysmography (rPPG) signals from driver facial videos. Facial expression features include facial action units and eye movement information, representing the driver's external characteristics. rPPG information, representing the driver's internal characteristics, is enhanced through a designed dual-path Transformer network and an introduced focus module. We also propose a cross-modal mutual attention computation mechanism to effectively fuse multimodal features by calculating mutual attention between facial expression features and rPPG information. In the final task output, we employ a multitask learning mechanism, setting discrete emotion recognition as the primary task and emotion valence recognition, emotion arousal recognition, and the previous rPPG information extraction as auxiliary tasks to facilitate effective information sharing across different tasks. Experimental results on the established driver emotion dataset demonstrate that our proposed method significantly improves driver emotion recognition performance, achieving an accuracy of 86.98% and an F1 score of 85.83% in the primary task. This validates the effectiveness of the proposed approach.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?