When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models

Haicheng Liao,Yongkang Li,Chengyue Wang,Yanchen Guan,KaHou Tam,Chunlin Tian,Li Li,Chengzhong Xu,Zhenning Li

2024-07-26

Abstract:As autonomous driving systems increasingly become part of daily transportation, the ability to accurately anticipate and mitigate potential traffic accidents is paramount. Traditional accident anticipation models primarily utilizing dashcam videos are adept at predicting when an accident may occur but fall short in localizing the incident and identifying involved entities. Addressing this gap, this study introduces a novel framework that integrates Large Language Models (LLMs) to enhance predictive capabilities across multiple dimensions--what, when, and where accidents might occur. We develop an innovative chain-based attention mechanism that dynamically adjusts to prioritize high-risk elements within complex driving scenes. This mechanism is complemented by a three-stage model that processes outputs from smaller models into detailed multimodal inputs for LLMs, thus enabling a more nuanced understanding of traffic dynamics. Empirical validation on the DAD, CCD, and A3D datasets demonstrates superior performance in Average Precision (AP) and Mean Time-To-Accident (mTTA), establishing new benchmarks for accident prediction technology. Our approach not only advances the technological framework for autonomous driving safety but also enhances human-AI interaction, making predictive insights generated by autonomous systems more intuitive and actionable.

Computer Vision and Pattern Recognition,Human-Computer Interaction

What problem does this paper attempt to address?

The paper aims to address the problem of traffic accident prediction and localization in autonomous driving systems. Traditional traffic accident prediction models mainly rely on dashcam videos to predict the timing of accidents, but they fall short in locating the accident site and identifying the involved entities. To overcome this limitation, the research team proposes a new framework that leverages large-scale language models (LLMs) to enhance prediction capabilities across three dimensions: "when," "where," and "what." Specifically: 1. **Introduction of Accident Localization Task**: Extending traditional accident prediction to include accident localization, which not only predicts whether and when an accident will occur but also determines the location and involved entities. 2. **Innovative Attention Mechanism**: Developing a chain-based dynamic attention mechanism (DOA) that dynamically adjusts attention weights based on high-risk elements in the traffic scene, thereby prioritizing high-risk targets. 3. **Multi-Stage Model Design**: Proposing a three-stage model that includes feature extraction and fusion, accident prediction and localization, and voice accident warning. The output of smaller models generates detailed multi-modal inputs for large-scale models to enhance the understanding of traffic dynamics. 4. **Performance Validation**: Experiments on the DAD, CCD, and A3D datasets demonstrate the superiority of this method in key metrics such as Average Precision (AP) and mean Time to Accident (mTTA), establishing new benchmarks and significantly improving the safety and human-machine interaction experience in autonomous driving technology.

When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models

A Deep Learning Method for Lane Changing Situation Assessment and Decision Making.

Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling

Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving

A Multimodal Data-Driven Approach for Driving Risk Assessment

CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

LLM Multimodal Traffic Accident Forecasting

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

SafeDrive: Knowledge- and Data-Driven Risk-Sensitive Decision-Making for Autonomous Vehicles with Large Language Models

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

Advancing Autonomous Driving Safety Through LLM Enhanced Trajectory Prediction

AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model

Receive, Reason, and React: Drive as You Say, With Large Language Models in Autonomous Vehicles

A Survey on Multimodal Large Language Models for Autonomous Driving

Driving Everywhere with Large Language Model Policy Adaptation

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving