Abstract:Driving can take up a substantial part of daily life and frequently trigger negative emotions like anger or anxiety, which have a significant adverse impact on driving safety as well as long-term human health. To identify driver emotions, thereby improving the safety and humanization of intelligent driving, we explore how to model the discriminative emotion features from both speech and facial expressions in this work. More specifically, an effective attention-based network for facial expression and a lightweight speech emotion network are proposed, separately. Then, audio and video features are combined at the feature level to construct our multimodal driver emotion recognition model. This paper proposes a new audio feature extractor that uses a multi-scale residual structure to extract spectrogram features. In terms of video, a set of frame sequences using Local Binary Pattern Histograms (LBPH) is obtained through preprocessing, which generates a fixed-dimensional feature representation. These features are then input into a fine-tuned ResNet18 model to analyze spatial information. This model is further augmented by integrating both a temporal attention module and a Gated Recurrent Unit (GRU), enhancing its capability to create a highly discriminative video representation. Additionally, we propose an Internet of Vehicles (IoV) platform, specifically designed for driver emotion recognition. The IoV platform consists of sensor layer, data acquisition and transport layer, server layer and data application layer. The IoV platform uses sensors to collect multimodal data from drivers, which can provide data support for the proposed multimodal driver emotion recognition algorithm. The performance of this proposed algorithm is evaluated on two multimodal emotional datasets, Ryerson Audio-Visual Dataset of Emotional Speech and Song (RAVDESS) and Surrey Audio-Visual Expressed Emotion (SAVEE), using a variety of performance indicators. Compared to other baseline methods, this proposed multimodal model achieves state-of-the-art results on the RAVDESS and SAVEE datasets, demonstrating superior recognition accuracy with rates of 0.93 and 0.99, respectively. Additionally, it exhibits precision scores of 0.93 on RAVDESS and 0.99 on SAVEE, along with exceptional specificity scores of 0.99 and 1.00, respectively.

A Multi-Modal Driver Emotion Dataset and Study: Including Facial Expressions and Synchronized Physiological Signals

Drivers' Comprehensive Emotion Recognition Based on HAM

Driver Multi-task Emotion Recognition Network Based on Multi-modal Facial Video Analysis

A multimodal psychological, physiological and behavioural dataset for human emotions in driving tasks

Driver Emotion Recognition Involving Multimodal Signals: Electrophysiological Response, Nasal-Tip Temperature, and Vehicle Behavior

A Multimodal Driver Emotion Recognition Algorithm Based on the Audio and Video Signals in Internet of Vehicles Platform

A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles

A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles: Emotions Triggered by Video-Audio Clips in Driving Scenarios

Driver Emotion Recognition with a Hybrid Attentional Multimodal Fusion Framework

Multimodal Dataset Construction and Validation for Driving-Related Anger: A Wearable Physiological Conduction and Vehicle Driving Data Approach

Global-Local-Feature-Fused Driver Speech Emotion Detection for Intelligent Cockpit in Automated Driving.

Multimodal driver emotion recognition using motor activity and facial expressions

Brain-Inspired Driver Emotion Detection for Intelligent Cockpits Based on Real Driving Data

Multimodal Data Collection System for Driver Emotion Recognition Based on Self-Reporting in Real-World Driving

A Multimodal Dataset for Mixed Emotion Recognition

DriveSense: A Multi-modal Emotion Recognition and Regulation System for a Car Driver

Performance Evaluation of Intelligent Driving Emotion Recognition Model based on Synthetic Dataset in Real Scenes

Multi-modal emotion analysis from facial expressions and electroencephalogram.

CogEmoNet: A Cognitive-Feature-Augmented Driver Emotion Recognition Model for Smart Cockpit

A multimodal physiological dataset for driving behaviour analysis

DERNet: Driver Emotion Recognition Using Onboard Camera