Abstract:As the number of vehicles and the volume of traffic swell in urban centers, cities have experienced a concomitant increase in traffic accidents. Proactively identifying accident-prone hotspots in urban environments holds the promise of preventing traffic mishaps, thereby curtailing the incidence of accidents and reducing property damage. This research introduces the Two-Branch Contextual Feature-Guided Converged Network (TCFGC-Net) utilizing multimodal satellite and street view data. Designed to extract global structural features from satellite imagery and dynamic continuous features from street view imagery, the model aims to improve the accuracy of detecting urban accident hotspots. For the satellite imagery branch, we propose the Contextual Feature Coupled Convolutional Neural Network (Trans-CFCCNN) designed to extract global spatial features and discern feature correlations across adjacent regions. For the street view imagery branch, we develop the Sequential Feature Recurrent Attention Network (SFRAN) to assimilate and integrate dynamic scene features captured from successive street view images. We designed the Multi-Branch Feature Adaptive Fusion Structure (MBFAF) to aggregate different branch features for accurate identification of accident hotspots. Experimental results show that the model performs well, with an overall accuracy of 93.7 %. Ablation studies confirm that relative to standalone street view and satellite branch analyses, implementing multimodal fusion enhances the model's accuracy by 12.05 % and 17.86 %, respectively. The innovative fusion structure proposed herein garners a 4.22 % increase in model accuracy, outpacing conventional feature concatenation techniques. Furthermore, the model outperforms existing deep learning models in terms of overall efficacy. Additionally, to showcase the efficacy of the proposed model structure, we utilize Class Activation Maps (CAM) to provide visual interpretability for the model. These results suggest that the dual-branch fusion model effectively decreases false alarm occurrences and directs the model's focus toward regions more pertinent to accident hotspots. Finally, the code and model used for identifying hotspots of urban traffic accidents in this study are available for access: https://github.com/gwt-ZJU/TCFGC-Net.

Perceiving Driving Hazards in a Data-fusion Way Using Multi-Modal Net and Semantic Driving Trajectory

A Multimodal Data-Driven Approach for Driving Risk Assessment

A system of vision sensor based deep neural networks for complex driving scene analysis in support of crash risk assessment and prevention

Fusion of Satellite and Street View Data for Urban Traffic Accident Hotspot Identification

Risk Prediction on Traffic Accidents using a Compact Neural Model for Multimodal Information Fusion over Urban Big Data

Efficient Traffic Accident Warning Based on Unsupervised Prediction Framework

Real-time driving risk prediction using a self-attention-based bidirectional long short-term memory network based on multi-source data

Radar and Camera Fusion for Multi-Task Sensing in Autonomous Driving

A multi-modal spatial–temporal model for accurate motion forecasting with visual fusion

Temporal Information Fusion Network for Driving Behavior Prediction.

MFN: A Multi-hop Fusion Network for Traffic Accident Risk Prediction.

Multi-modal Trajectory Prediction for Autonomous Driving with Semantic Map and Dynamic Graph Attention Network

AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model

Multi-Modal Sensor Fusion-Based Deep Neural Network for End-to-End Autonomous Driving With Scene Understanding

Data-driven multi-dimension driving safety evaluation for real-world electric vehicles

Object Detection Using Multi-Sensor Fusion Based on Deep Learning

Towards Efficient Risky Driving Detection: A Benchmark and a Semi-Supervised Model

ParallelNet: Multi-mode Trajectory Prediction by Multi-mode Trajectory Fusion

MM-LMF: A Low-Rank Multimodal Fusion Dangerous Driving Behavior Recognition Method Based on FMCW Signals