Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving

Jiyao Wang,Xiao Yang,Zhenyu Wang,Ximeng Wei,Ange Wang,Dengbo He,Kaishun Wu
2024-10-28
Abstract:Road safety remains a critical challenge worldwide, with approximately 1.35 million fatalities annually attributed to traffic accidents, often due to human errors. As we advance towards higher levels of vehicle automation, challenges still exist, as driving with automation can cognitively over-demand drivers if they engage in non-driving-related tasks (NDRTs), or lead to drowsiness if driving was the sole task. This calls for the urgent need for an effective Driver Monitoring System (DMS) that can evaluate cognitive load and drowsiness in SAE Level-2/3 autonomous driving contexts. In this study, we propose a novel multi-task DMS, termed VDMoE, which leverages RGB video input to monitor driver states non-invasively. By utilizing key facial features to minimize computational load and integrating remote Photoplethysmography (rPPG) for physiological insights, our approach enhances detection accuracy while maintaining efficiency. Additionally, we optimize the Mixture-of-Experts (MoE) framework to accommodate multi-modal inputs and improve performance across different tasks. A novel prior-inclusive regularization method is introduced to align model outputs with statistical priors, thus accelerating convergence and mitigating overfitting risks. We validate our method with the creation of a new dataset (MCDD), which comprises RGB video and physiological indicators from 42 participants, and two public datasets. Our findings demonstrate the effectiveness of VDMoE in monitoring driver states, contributing to safer autonomous driving systems. The code and data will be released.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to effectively monitor the driver's state, especially the cognitive load and fatigue state, in the context of conditional autonomous driving (i.e., SAE Level - 2/3 automated driving), so as to improve road safety and the reliability of the autonomous driving system. Specifically, the paper proposes a video - based multi - task driver state monitoring system (VDMoE), which non - invasively monitors the driver's state using RGB video input. By using key facial features to reduce the computational load and combining the remote photoplethysmography (rPPG) technique to obtain physiological indicators, this method aims to improve the detection accuracy while maintaining efficiency. In addition, the paper also optimizes the Mixture - of - Experts (MoE) framework, enabling it to handle multi - modal inputs and improve performance in different tasks. To accelerate model convergence and reduce the risk of over - fitting, a new prior - inclusive regularization method is introduced. The effectiveness of this method is verified by creating a new dataset (MCDD) and other public datasets. ### Specific problem description: 1. **Road safety challenges**: Approximately 1.35 million people die in traffic accidents every year, and most of these accidents are caused by human error. With the development of autonomous driving technology, although human error can be reduced, during the autonomous driving process, if the driver is engaged in non - driving - related tasks (NDRTs), it may lead to cognitive overload; if driving is the only task, it may lead to fatigue. This requires an effective driver monitoring system (DMS) to assess cognitive load and fatigue states. 2. **Limitations of existing DMS**: - **Traditional DMS**: It depends on various sensors (such as electrocardiogram sensors, vehicle sensors, etc.), but these sensors are not always available in SAE Level - 2 or Level - 3 vehicles. - **Physiological signals**: Although they can be effective indicators of the driver's state, they usually require invasive physiological sensors (such as EEG, EOG, etc.), which are difficult to implement in practical applications. - **Image - based methods**: Most methods are based on single - frame detection and lose the information of temporal changes. - **Multi - task monitoring**: Existing video DMS mainly focuses on a single task (such as fatigue or distraction detection), while the driver's state is usually multi - faceted and mutually influential. ### Solutions: 1. **Propose the VDMoE system**: Use RGB video input to achieve multi - task monitoring (fatigue, cognitive load, and physiological indicators) by extracting key facial features and integrating the rPPG technique. 2. **Optimize the MoE framework**: Design a heterogeneous gating mechanism and spatio - temporal expert separation to adapt to multi - modal inputs and improve multi - task performance. 3. **Introduce prior - inclusive regularization**: Align the probability distribution of the model output with the statistical prior through the regularization method to accelerate convergence and reduce the risk of over - fitting. 4. **Create a new dataset**: A dataset (MCDD) containing 42 participants was collected through driving simulator experiments to verify the effectiveness of the method. ### Main contributions: 1. A multi - task driver state monitoring method (VDMoE) based on RGB video is proposed, which focuses on evaluating fatigue and physiological responses in the context of autonomous driving while considering the influence of multi - dimensional cognitive load. 2. By using key facial features and STMap to supplement physiological features, the balance between estimation accuracy and efficiency is achieved. 3. The MoE structure is optimized to adapt to multi - modal and spatio - temporal inputs and is lightweighted through the MLP network. 4. A new prior - inclusive regularization method is introduced, which improves the learning ability and generalization ability of the model. 5. The first multi - modal cognitive load and fatigue driving dataset (MCDD) applicable to the L3 autonomous driving context is created, containing data of 42 participants.