Abstract:Autonomous driving systems (ADS) are safety-critical and require comprehensive testing before their deployment on public roads. While existing testing approaches primarily aim at the criticality of scenarios, they often overlook the diversity of the generated scenarios that is also important to reflect system defects in different aspects. To bridge the gap, we propose LeGEND, that features a top-down fashion of scenario generation: it starts with abstract functional scenarios, and then steps downwards to logical and concrete scenarios, such that scenario diversity can be controlled at the functional level. However, unlike logical scenarios that can be formally described, functional scenarios are often documented in natural languages (e.g., accident reports) and thus cannot be precisely parsed and processed by computers. To tackle that issue, LeGEND leverages the recent advances of large language models (LLMs) to transform textual functional scenarios to formal logical scenarios. To mitigate the distraction of useless information in functional scenario description, we devise a two-phase transformation that features the use of an intermediate language; consequently, we adopt two LLMs in LeGEND, one for extracting information from functional scenarios, the other for converting the extracted information to formal logical scenarios. We experimentally evaluate LeGEND on Apollo, an industry-grade ADS from Baidu. Evaluation results show that LeGEND can effectively identify critical scenarios, and compared to baseline approaches, LeGEND exhibits evident superiority in diversity of generated scenarios. Moreover, we also demonstrate the advantages of our two-phase transformation framework, and the accuracy of the adopted LLMs.

What problem does this paper attempt to address?

The paper aims to address two main issues in the testing of Automated Driving Systems (ADS): **diversity and criticality of scenarios**. 1. **Diversity Issue**: Existing testing methods mainly focus on generating critical scenarios that can expose system defects but often overlook the diversity of these scenarios. This means that the generated scenarios may exhibit similar ADS behaviors, reflecting similar system defects. Even though some methods attempt to mitigate this issue by considering diversity as a search objective, the effect is still limited because the detected scenarios are constrained by logical scenarios. 2. **Criticality Issue**: To ensure the safety of ADS, comprehensive testing is required before deployment on public roads. Existing testing methods usually start from logical scenarios, which define the environment (such as road structure, weather) and traffic participants, but leave a state space (identified by multiple variables, such as the initial state of vehicles) to find critical concrete scenarios through optimization search. However, these methods are insufficient in generating diverse scenarios. To address these issues, the paper proposes a method called LeGEND, which adopts a top-down scenario generation strategy, starting from functional-level scenarios and gradually generating logical-level and concrete-level scenarios. This method not only generates critical scenarios but also ensures the diversity of scenarios. Specifically: - **Functional-Level Scenarios**: These scenarios are described in natural language, providing a conceptual description of events, including key actions and interactions within the scenario. - **Logical-Level Scenarios**: By converting functional-level scenarios into formal logical scenarios, these scenarios can be parsed and processed by computers. - **Concrete-Level Scenarios**: By searching within the state space of logical scenarios, concrete scenarios are generated to test the behavior of ADS. Additionally, LeGEND utilizes Large Language Models (LLMs) to handle natural language input and improve the accuracy of conversion through a two-stage conversion framework. In the first stage, LLM1 extracts useful information from accident reports and records it as an Interaction Pattern Sequence (IPS). In the second stage, LLM2 converts the IPS into formal logical scenarios. This design allows LeGEND to control the diversity of scenarios at the functional level, thereby generating more diverse critical scenarios.

LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language Models

LMM-enhanced Safety-Critical Scenario Generation for Autonomous Driving System Testing From Non-Accident Traffic Videos

Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model

Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

A Survey on Large Language Model-empowered Autonomous Driving

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Generating Out-Of-Distribution Scenarios Using Language Models

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles

A Language Agent for Autonomous Driving

Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model

Facilitating Autonomous Driving Tasks with Large Language Models

EvoScenario: Integrating Road Structures into Critical Scenario Generation for Autonomous Driving System Testing

Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Applications of Large Scale Foundation Models for Autonomous Driving

Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models