Abstract:Real-time assessment and short-term warning of driving risks are critical for AI-assisted vehicles to significantly improve the safety and reliability of mobility. However, existing methods do not comprehensively consider these factors, making it difficult to achieve more accurate risk assessments. Aiming at this problem, this paper proposes a new driving risk assessment framework by integrating multimodal data. First, based on naturalistic driving experiments, we collected multimodal data encompassing human-vehicle-road factors. Then, using the Latent Dirichlet Allocation (LDA) model, we identified three risk levels based on driving behavior features: normal driving, longitudinal risky driving, and lateral risky driving. To better understand the spatiotemporal importance of multiple factors, a spatiotemporal dual-channel neural network based on a multi-layer attention mechanism (MLA-DCNN) is developed. This model has a spatiotemporal dual-channel structure, which can integrate "low-level" historical sequences and "high-level" extract statistical features of multiple features. In addition, it adopts three layers of attention mechanism, respectively used to capture the differences of features in temporal, spatial, and extracted-level dimensions. Results reveal that the LDA model is more effective than traditional clustering methods in uncovering latent patterns of driving risk. The proposed model achieved an impressive accuracy of 91.04%, demonstrating higher risk assessment capabilities than the other alternative models. In addition, the multilayer attention enhances the interpretability of the model and is able to capture the spatiotemporal importance of different factors across various road environments. This method can be applied to connected and automated vehicles (CAVs) using multimodal natural driving data collected by in-vehicle sensors. It enhances the risk warning capabilities of driving assistance systems, and the multidimensional importance analysis also supports decision-making for traffic management authorities.

Multi-scale space-time transformer for driving behavior detection

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Dynamic-learning Spatial-Temporal Transformer Network for Vehicular Trajectory Prediction at Urban Intersections

DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification

DS-Trans: A 3D Object Detection Method Based on a Deformable Spatiotemporal Transformer for Autonomous Vehicles

Multimodal driver distraction detection using dual-channel network of CNN and Transformer

MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition

TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration

A Multimodal Data-Driven Approach for Driving Risk Assessment

Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction

MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying

MmSTCT: spatial–temporal convolution transformer network considering driving intention for multimodal vehicle trajectory prediction of highway

M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving

Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition

DRUformer: Enhancing Driving Scene Important Object Detection With Driving Scene Relationship Understanding

ControlMTR: Control-Guided Motion Transformer with Scene-Compliant Intention Points for Feasible Motion Prediction

DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding

Pose-guided multi-task video transformer for driver action recognition

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

Multi Self-supervised Pre-fine-tuned Transformer Fusion for Better Intelligent Transportation Detection

Lane Detection Transformer Based on Multi-frame Horizontal and Vertical Attention and Visual Transformer Module.