Abstract:As the number of vehicles and the volume of traffic swell in urban centers, cities have experienced a concomitant increase in traffic accidents. Proactively identifying accident-prone hotspots in urban environments holds the promise of preventing traffic mishaps, thereby curtailing the incidence of accidents and reducing property damage. This research introduces the Two-Branch Contextual Feature-Guided Converged Network (TCFGC-Net) utilizing multimodal satellite and street view data. Designed to extract global structural features from satellite imagery and dynamic continuous features from street view imagery, the model aims to improve the accuracy of detecting urban accident hotspots. For the satellite imagery branch, we propose the Contextual Feature Coupled Convolutional Neural Network (Trans-CFCCNN) designed to extract global spatial features and discern feature correlations across adjacent regions. For the street view imagery branch, we develop the Sequential Feature Recurrent Attention Network (SFRAN) to assimilate and integrate dynamic scene features captured from successive street view images. We designed the Multi-Branch Feature Adaptive Fusion Structure (MBFAF) to aggregate different branch features for accurate identification of accident hotspots. Experimental results show that the model performs well, with an overall accuracy of 93.7 %. Ablation studies confirm that relative to standalone street view and satellite branch analyses, implementing multimodal fusion enhances the model's accuracy by 12.05 % and 17.86 %, respectively. The innovative fusion structure proposed herein garners a 4.22 % increase in model accuracy, outpacing conventional feature concatenation techniques. Furthermore, the model outperforms existing deep learning models in terms of overall efficacy. Additionally, to showcase the efficacy of the proposed model structure, we utilize Class Activation Maps (CAM) to provide visual interpretability for the model. These results suggest that the dual-branch fusion model effectively decreases false alarm occurrences and directs the model's focus toward regions more pertinent to accident hotspots. Finally, the code and model used for identifying hotspots of urban traffic accidents in this study are available for access: https://github.com/gwt-ZJU/TCFGC-Net.

Local and Global Context Attentive Fusion Network for Traffic Scene Parsing

Traffic Sign Detection using Feature Fusion and Contextual Information

AEGLR-Net: Attention Enhanced Global-Local Refined Network for Accurate Detection of Car Body Surface Defects

MFCANet: A Road Scene Segmentation Network Based on Multi-Scale Feature Fusion and Context Information Aggregation

Fusion of Satellite and Street View Data for Urban Traffic Accident Hotspot Identification

RBCANet: Recursive and Bidimensional Context Aggregation Network for Lane Detection

Adaptive Context Network for Scene Parsing

Multifeature Selective Fusion Network for Real-Time Driving Scene Parsing

GHAFNet: Global-context hierarchical attention fusion method for traffic object detection

A Global-Local Feature Adaptive Fusion Network for Image Scene Classification

Fully Combined Convolutional Network With Soft Cost Function For Traffic Scene Parsing

Global Context-Aware Progressive Aggregation Network for Salient Object Detection

Lightweight cross-guided contextual perceptive network for visible–infrared urban road scene parsing

CAFFNet: Channel Attention and Feature Fusion Network for Multi-target Traffic Sign Detection

Context-Aware and Attention-Driven Weighted Fusion Traffic Sign Detection Network

Global-Guided Selective Context Network for Scene Parsing

Multi-layer feature fusion and attention enhancement for fine-grained vehicle recognition research

Research on Multi-Task Perception Network of Traffic Scene Based on Feature Fusion1

Context Attention Fusion Network for crowd counting

CMNet: A Connect-and-Merge Convolutional Neural Network for Fast Vehicle Detection in Urban Traffic Surveillance.

LGCNet: A Local-to-global Context-Aware Feature Augmentation Network for Salient Object Detection