Abstract:Distracted driving is a leading cause of road accidents globally. Identification of distracted driving involves reliably detecting and classifying various forms of driver distraction (e.g., texting, eating, or using in-car devices) from in-vehicle camera feeds to enhance road safety. This task is challenging due to the need for robust models that can generalize to a diverse set of driver behaviors without requiring extensive annotated datasets. In this paper, we propose KiD3, a novel method for distracted driver detection (DDD) by infusing auxiliary knowledge about semantic relations between entities in a scene and the structural configuration of the driver's pose. Specifically, we construct a unified framework that integrates the scene graphs, and driver pose information with the visual cues in video frames to create a holistic representation of the driver's actions.Our results indicate that KiD3 achieves a 13.64% accuracy improvement over the vision-only baseline by incorporating such auxiliary knowledge with visual information.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the detection and classification of distracted driving (Distracted Driving Detection, DDD). Specifically, the authors aim to improve the ability to reliably detect and classify various forms of driver distraction behaviors (such as texting, eating, or using in - vehicle devices) from the video stream of in - vehicle cameras by integrating auxiliary knowledge, such as scene graphs and driver's pose information. This task is crucial for enhancing road safety. ### Main problems 1. **Reliable detection and classification**: Identifying distracted driving requires the ability to accurately detect and classify different types of driver distraction behaviors. 2. **Model generalization ability**: The model needs to be robust among diverse driver behaviors without the support of a large number of labeled datasets. 3. **Improving detection accuracy**: Improve detection accuracy by introducing auxiliary knowledge without relying on high - parameter complex models. ### Solutions proposed in the paper To solve the above problems, the paper proposes a new method named KiD3. KiD3 achieves improvement in the following ways: - **Introducing auxiliary knowledge**: Integrate scene graphs and driver's pose information to build a unified framework, thereby enhancing the overall representation of driver behaviors. - **Multi - modal information fusion**: Combine visual cues, scene semantic relationships, and driver posture structural configurations to form a more comprehensive behavior representation. - **Simplified but effective architecture**: Adopt a simple method to significantly improve the detection performance without increasing the computational burden. ### Experimental results The experimental results show that KiD3 outperforms the baseline model that only relies on visual information on the real - world dataset. Specifically, KiD3 has a 13.64% improvement in accuracy compared to the pure - visual model, and the F1 score is also increased. This indicates that introducing auxiliary knowledge can effectively improve the performance of distracted driving detection and contribute to creating a safer driving environment. Through these improvements, KiD3 provides a more reliable, efficient, and scalable solution that can significantly improve the effect of distracted driving detection without relying on expensive high - parameter models.

Towards Infusing Auxiliary Knowledge for Distracted Driver Detection

<italic>DetectDUI</italic>: An In-Car Detection System for Drink Driving and BACs

Toward Extremely Lightweight Distracted Driver Recognition With Distillation-Based Neural Architecture Search and Knowledge Transfer

Driver distraction detection and recognition using RGB-D sensor

Keep Your AI-es on the Road: Tackling Distracted Driver Detection with Convolutional Neural Networks and Targeted Data Augmentation

Distracted Driving Detection Based on the Fusion of Deep Learning and Causal Reasoning

Processing and Integration of Multimodal Image Data Supporting the Detection of Behaviors Related to Reduced Concentration Level of Motor Vehicle Users

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

Three stage classification framework with ranking scheme for distracted driver detection using heuristic-assisted strategy

Instance, Scale, and Teacher Adaptive Knowledge Distillation for Visual Detection in Autonomous Driving

Attention Monitoring and Hazard Assessment with Bio-Sensing and Vision: Empirical Analysis Utilizing CNNs on the KITTI Dataset

Driver inattention monitoring system based on multimodal fusion with visual cues to improve driving safety

An Efficient Deep Learning Framework for Distracted Driver Detection

Detection of Distracted Driver using Convolution Neural Network

Face Positioned Driver Drowsiness Detection Using Multistage Adaptive 3D Convolutional Neural Network

DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification

Detection of distracted driving through the analysis of real-time driver, vehicle, and roadway volatilities

Enhancing Road Safety: Real-Time Detection of Driver Distraction through Convolutional Neural Networks

Drive-Net: Convolutional Network for Driver Distraction Detection

Driver Distraction and Drowsiness Detection Based on Object Detection Using Deep Learning Algorithm

AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception