Abstract:Road safety remains a critical challenge worldwide, with approximately 1.35 million fatalities annually attributed to traffic accidents, often due to human errors. As we advance towards higher levels of vehicle automation, challenges still exist, as driving with automation can cognitively over-demand drivers if they engage in non-driving-related tasks (NDRTs), or lead to drowsiness if driving was the sole task. This calls for the urgent need for an effective Driver Monitoring System (DMS) that can evaluate cognitive load and drowsiness in SAE Level-2/3 autonomous driving contexts. In this study, we propose a novel multi-task DMS, termed VDMoE, which leverages RGB video input to monitor driver states non-invasively. By utilizing key facial features to minimize computational load and integrating remote Photoplethysmography (rPPG) for physiological insights, our approach enhances detection accuracy while maintaining efficiency. Additionally, we optimize the Mixture-of-Experts (MoE) framework to accommodate multi-modal inputs and improve performance across different tasks. A novel prior-inclusive regularization method is introduced to align model outputs with statistical priors, thus accelerating convergence and mitigating overfitting risks. We validate our method with the creation of a new dataset (MCDD), which comprises RGB video and physiological indicators from 42 participants, and two public datasets. Our findings demonstrate the effectiveness of VDMoE in monitoring driver states, contributing to safer autonomous driving systems. The code and data will be released.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to effectively monitor the driver's state, especially the cognitive load and fatigue state, in the context of conditional autonomous driving (i.e., SAE Level - 2/3 automated driving), so as to improve road safety and the reliability of the autonomous driving system. Specifically, the paper proposes a video - based multi - task driver state monitoring system (VDMoE), which non - invasively monitors the driver's state using RGB video input. By using key facial features to reduce the computational load and combining the remote photoplethysmography (rPPG) technique to obtain physiological indicators, this method aims to improve the detection accuracy while maintaining efficiency. In addition, the paper also optimizes the Mixture - of - Experts (MoE) framework, enabling it to handle multi - modal inputs and improve performance in different tasks. To accelerate model convergence and reduce the risk of over - fitting, a new prior - inclusive regularization method is introduced. The effectiveness of this method is verified by creating a new dataset (MCDD) and other public datasets. ### Specific problem description: 1. **Road safety challenges**: Approximately 1.35 million people die in traffic accidents every year, and most of these accidents are caused by human error. With the development of autonomous driving technology, although human error can be reduced, during the autonomous driving process, if the driver is engaged in non - driving - related tasks (NDRTs), it may lead to cognitive overload; if driving is the only task, it may lead to fatigue. This requires an effective driver monitoring system (DMS) to assess cognitive load and fatigue states. 2. **Limitations of existing DMS**: - **Traditional DMS**: It depends on various sensors (such as electrocardiogram sensors, vehicle sensors, etc.), but these sensors are not always available in SAE Level - 2 or Level - 3 vehicles. - **Physiological signals**: Although they can be effective indicators of the driver's state, they usually require invasive physiological sensors (such as EEG, EOG, etc.), which are difficult to implement in practical applications. - **Image - based methods**: Most methods are based on single - frame detection and lose the information of temporal changes. - **Multi - task monitoring**: Existing video DMS mainly focuses on a single task (such as fatigue or distraction detection), while the driver's state is usually multi - faceted and mutually influential. ### Solutions: 1. **Propose the VDMoE system**: Use RGB video input to achieve multi - task monitoring (fatigue, cognitive load, and physiological indicators) by extracting key facial features and integrating the rPPG technique. 2. **Optimize the MoE framework**: Design a heterogeneous gating mechanism and spatio - temporal expert separation to adapt to multi - modal inputs and improve multi - task performance. 3. **Introduce prior - inclusive regularization**: Align the probability distribution of the model output with the statistical prior through the regularization method to accelerate convergence and reduce the risk of over - fitting. 4. **Create a new dataset**: A dataset (MCDD) containing 42 participants was collected through driving simulator experiments to verify the effectiveness of the method. ### Main contributions: 1. A multi - task driver state monitoring method (VDMoE) based on RGB video is proposed, which focuses on evaluating fatigue and physiological responses in the context of autonomous driving while considering the influence of multi - dimensional cognitive load. 2. By using key facial features and STMap to supplement physiological features, the balance between estimation accuracy and efficiency is achieved. 3. The MoE structure is optimized to adapt to multi - modal and spatio - temporal inputs and is lightweighted through the MLP network. 4. A new prior - inclusive regularization method is introduced, which improves the learning ability and generalization ability of the model. 5. The first multi - modal cognitive load and fatigue driving dataset (MCDD) applicable to the L3 autonomous driving context is created, containing data of 42 participants.

Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving

DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis

A Multimodal Data-Driven Approach for Driving Risk Assessment

Monitoring and Analyzing Driver Physiological States Based on Automotive Electronic Identification and Multimodal Biometric Recognition Methods

Real-Time System for Driver Fatigue Detection by RGB-D Camera

Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

Optimizing Driver Vigilance Recognition: Examining the Characterization and Cumulative Effect of Physiological Signals Across Manual and Automated Driving and Durations

The Multimodal Driver Monitoring Database: A Naturalistic Corpus to Study Driver Attention

All in One Network for Driver Attention Monitoring

Driving Cognitive Alertness Detecting Using Evoked Multimodal Physiological Signals Based on Uncertain Self-Supervised Learning

Video-Based Driver Drowsiness Detection With Optimised Utilization of Key Facial Features

Real-time detection method of driver fatigue state based on deep learning of face video

Online vigilance analysis combining video and electrooculography features

Multimodal Driver Condition Monitoring System Operating in the Far-Infrared Spectrum

Residual Attention Capsule Network for Multimodal EEG- and EOG-Based Driver Vigilance Estimation

Driver inattention monitoring system based on multimodal fusion with visual cues to improve driving safety

Driver fatigue detection method based on multi-feature empirical fusion model

EEG and ECG-Based Multi-Sensor Fusion Computing for Real-Time Fatigue Driving Recognition Based on Feedback Mechanism

Multi-Attention Fusion Drowsy Driving Detection Model

Driving Fatigue Detection Based on Hybrid Electroencephalography and Eye Tracking

Driver Emotion and Fatigue State Detection Based on Time Series Fusion