LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language Models

Shuncheng Tang,Zhenya Zhang,Jixiang Zhou,Lei Lei,Yuan Zhou,Yinxing Xue
2024-09-16
Abstract:Autonomous driving systems (ADS) are safety-critical and require comprehensive testing before their deployment on public roads. While existing testing approaches primarily aim at the criticality of scenarios, they often overlook the diversity of the generated scenarios that is also important to reflect system defects in different aspects. To bridge the gap, we propose LeGEND, that features a top-down fashion of scenario generation: it starts with abstract functional scenarios, and then steps downwards to logical and concrete scenarios, such that scenario diversity can be controlled at the functional level. However, unlike logical scenarios that can be formally described, functional scenarios are often documented in natural languages (e.g., accident reports) and thus cannot be precisely parsed and processed by computers. To tackle that issue, LeGEND leverages the recent advances of large language models (LLMs) to transform textual functional scenarios to formal logical scenarios. To mitigate the distraction of useless information in functional scenario description, we devise a two-phase transformation that features the use of an intermediate language; consequently, we adopt two LLMs in LeGEND, one for extracting information from functional scenarios, the other for converting the extracted information to formal logical scenarios. We experimentally evaluate LeGEND on Apollo, an industry-grade ADS from Baidu. Evaluation results show that LeGEND can effectively identify critical scenarios, and compared to baseline approaches, LeGEND exhibits evident superiority in diversity of generated scenarios. Moreover, we also demonstrate the advantages of our two-phase transformation framework, and the accuracy of the adopted LLMs.
Software Engineering
What problem does this paper attempt to address?
The paper aims to address two main issues in the testing of Automated Driving Systems (ADS): **diversity and criticality of scenarios**. 1. **Diversity Issue**: Existing testing methods mainly focus on generating critical scenarios that can expose system defects but often overlook the diversity of these scenarios. This means that the generated scenarios may exhibit similar ADS behaviors, reflecting similar system defects. Even though some methods attempt to mitigate this issue by considering diversity as a search objective, the effect is still limited because the detected scenarios are constrained by logical scenarios. 2. **Criticality Issue**: To ensure the safety of ADS, comprehensive testing is required before deployment on public roads. Existing testing methods usually start from logical scenarios, which define the environment (such as road structure, weather) and traffic participants, but leave a state space (identified by multiple variables, such as the initial state of vehicles) to find critical concrete scenarios through optimization search. However, these methods are insufficient in generating diverse scenarios. To address these issues, the paper proposes a method called LeGEND, which adopts a top-down scenario generation strategy, starting from functional-level scenarios and gradually generating logical-level and concrete-level scenarios. This method not only generates critical scenarios but also ensures the diversity of scenarios. Specifically: - **Functional-Level Scenarios**: These scenarios are described in natural language, providing a conceptual description of events, including key actions and interactions within the scenario. - **Logical-Level Scenarios**: By converting functional-level scenarios into formal logical scenarios, these scenarios can be parsed and processed by computers. - **Concrete-Level Scenarios**: By searching within the state space of logical scenarios, concrete scenarios are generated to test the behavior of ADS. Additionally, LeGEND utilizes Large Language Models (LLMs) to handle natural language input and improve the accuracy of conversion through a two-stage conversion framework. In the first stage, LLM1 extracts useful information from accident reports and records it as an Interaction Pattern Sequence (IPS). In the second stage, LLM2 converts the IPS into formal logical scenarios. This design allows LeGEND to control the diversity of scenarios at the functional level, thereby generating more diverse critical scenarios.