Abstract:Driving can take up a substantial part of daily life and frequently trigger negative emotions like anger or anxiety, which have a significant adverse impact on driving safety as well as long-term human health. To identify driver emotions, thereby improving the safety and humanization of intelligent driving, we explore how to model the discriminative emotion features from both speech and facial expressions in this work. More specifically, an effective attention-based network for facial expression and a lightweight speech emotion network are proposed, separately. Then, audio and video features are combined at the feature level to construct our multimodal driver emotion recognition model. This paper proposes a new audio feature extractor that uses a multi-scale residual structure to extract spectrogram features. In terms of video, a set of frame sequences using Local Binary Pattern Histograms (LBPH) is obtained through preprocessing, which generates a fixed-dimensional feature representation. These features are then input into a fine-tuned ResNet18 model to analyze spatial information. This model is further augmented by integrating both a temporal attention module and a Gated Recurrent Unit (GRU), enhancing its capability to create a highly discriminative video representation. Additionally, we propose an Internet of Vehicles (IoV) platform, specifically designed for driver emotion recognition. The IoV platform consists of sensor layer, data acquisition and transport layer, server layer and data application layer. The IoV platform uses sensors to collect multimodal data from drivers, which can provide data support for the proposed multimodal driver emotion recognition algorithm. The performance of this proposed algorithm is evaluated on two multimodal emotional datasets, Ryerson Audio-Visual Dataset of Emotional Speech and Song (RAVDESS) and Surrey Audio-Visual Expressed Emotion (SAVEE), using a variety of performance indicators. Compared to other baseline methods, this proposed multimodal model achieves state-of-the-art results on the RAVDESS and SAVEE datasets, demonstrating superior recognition accuracy with rates of 0.93 and 0.99, respectively. Additionally, it exhibits precision scores of 0.93 on RAVDESS and 0.99 on SAVEE, along with exceptional specificity scores of 0.99 and 1.00, respectively.

Global-Local-Feature-Fused Driver Speech Emotion Detection for Intelligent Cockpit in Automated Driving.

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Brain-Inspired Driver Emotion Detection for Intelligent Cockpits Based on Real Driving Data

·Ai-Enabled Intelligent Cockpit Proactive Affective Interaction: Middle-Level Feature Fusion Dual-Branch Deep Learning Network for Driver Emotion Recognition

CogEmoNet: A Cognitive-Feature-Augmented Driver Emotion Recognition Model for Smart Cockpit

Deep Spectrum Feature Representations for Speech Emotion Recognition

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

Driver’s Speech Emotion Recognition for Smart Cockpit Based on a Self-Attention Deep Learning Framework

Drivers' Comprehensive Emotion Recognition Based on HAM

A Multi-Modal Driver Emotion Dataset and Study: Including Facial Expressions and Synchronized Physiological Signals

Intelligent In-Car Emotion Regulation Interaction System Based on Speech Emotion Recognition

Driver Emotion Recognition Involving Multimodal Signals: Electrophysiological Response, Nasal-Tip Temperature, and Vehicle Behavior

Research on Emotion Recognition Method of Flight Training Based on Multimodal Fusion

A Multimodal Driver Emotion Recognition Algorithm Based on the Audio and Video Signals in Internet of Vehicles Platform

Performance Evaluation of Intelligent Driving Emotion Recognition Model based on Synthetic Dataset in Real Scenes

Multi-feature Fusion Speech Emotion Recognition Based on SVM

DriveSense: A Multi-modal Emotion Recognition and Regulation System for a Car Driver

An autoencoder-based feature level fusion for speech emotion recognition

Driver Emotion Recognition with a Hybrid Attentional Multimodal Fusion Framework

Driver Multi-task Emotion Recognition Network Based on Multi-modal Facial Video Analysis

A Feature Fusion Model with Data Augmentation for Speech Emotion Recognition