Abstract:Ensuring the safety of autonomous vehicles (AVs) requires identifying rare but critical failure cases that on-road testing alone cannot discover. High-fidelity simulations provide a scalable alternative, but automatically generating realistic and diverse traffic scenarios that can effectively stress test AV motion planners remains a key challenge. This paper introduces CRASH - Challenging Reinforcement-learning based Adversarial scenarios for Safety Hardening - an adversarial deep reinforcement learning framework to address this issue. First CRASH can control adversarial Non Player Character (NPC) agents in an AV simulator to automatically induce collisions with the Ego vehicle, falsifying its motion planner. We also propose a novel approach, that we term safety hardening, which iteratively refines the motion planner by simulating improvement scenarios against adversarial agents, leveraging the failure cases to strengthen the AV stack. CRASH is evaluated on a simplified two-lane highway scenario, demonstrating its ability to falsify both rule-based and learning-based planners with collision rates exceeding 90%. Additionally, safety hardening reduces the Ego vehicle's collision rate by 26%. While preliminary, these results highlight RL-based safety hardening as a promising approach for scenario-driven simulation testing for autonomous vehicles.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the safety issue of autonomous vehicles (AVs) in the face of rare but critical failure cases. Specifically, the paper points out that the current method of discovering these failure cases through actual road tests has limitations because these situations are very rare and difficult to detect through routine tests. To address this challenge, the paper proposes a framework named CRASH (Challenging Reinforcement - learning based Adversarial Scenarios for Safety Hardening), which uses an adversarial deep reinforcement - learning method to automatically generate realistic and diverse traffic scenarios that can effectively test AV motion planners. ### Main Problems 1. **Automatic Falsification**: - The goal is to discover scenarios that may lead to system failures by optimizing the behavior of non - player characters (NPCs) so that they collide with the autonomous vehicle (Ego vehicle). - Specifically, CRASH trains NPC agents to maximize the number of collisions with the Ego vehicle. The optimization objective is to find an optimal policy \(\pi^*_{\text{NPC}}(S_t, A_t)\) such that the cumulative number of collisions is maximized: \[ \pi^*_{\text{NPC}}(S_t, A_t)=\arg\max_{A_t}\left(\sum_{j = 0}^{E}\phi\right) \] where \(\phi\) is a binary indicator variable, which is 1 when a collision is detected and 0 otherwise; \(E\) represents the total number of episodes. 2. **Safety Hardening**: - Once the failure - causing scenarios are discovered through automatic falsification, the next step is to enhance the robustness of the Ego vehicle's motion planner through iterative training and reduce the collision rate. - The goal of safety hardening is to find an optimal policy \(\pi^*_{\text{Ego}}(S_t, A_t)\) such that the cumulative number of collisions is minimized: \[ \pi^*_{\text{Ego}}(S_t, A_t)=\arg\min_{A_t}\left(\sum_{e = 0}^{E}\phi\right) \] - This is a two - level optimization problem, in which the strategies of Ego and NPC are optimized in turn, and Ego tries to avoid collisions in the adversarial scenarios generated by NPC. ### Method Overview - **Automatic Falsification**: Use the Deep Q - Network (DQN) to train NPC agents and design an adversarial reward function, including collision rewards and time - continuous rewards based on Time - to - Collision (TTC), to guide the behavior of NPCs. - **Safety Hardening**: Gradually improve the robustness of the Ego vehicle through three methods: local safety hardening, uniform model - pool - based safety hardening, and priority model - pool - based safety hardening. ### Experimental Results - **Experimental Setup**: Use the highway - env simulator to test rule - based and learning - based motion planners. Each episode starts randomly selected from 8 initial configurations. - **Automatic Falsification Results**: The results show that NPCs using the continuous reward function (especially with weights \(w_1 = 400\), \(w_2=4\), \(w_3 = 1\)) can learn and generate adversarial scenarios more effectively under various initial configurations, and the collision rate is significantly increased. - **Safety Hardening Results**: Through safety hardening, the collision rate of the Ego vehicle is significantly reduced, especially after multiple training cycles, the collision rate is reduced by 26%. In conclusion, the CRASH framework provides a systematic method to improve the safety of autonomous vehicles by generating adversarial scenarios and iteratively improving motion planners.

CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening

Adversarial Generation of Safety-Critical Lane-Change Scenarios for Autonomous Vehicles

Interactive Critical Scenario Generation for Autonomous Vehicles Testing Based on In-depth Crash Data Using Reinforcement Learning

Realistic Extreme Behavior Generation for Improved AV Testing

Towards Automated Safety Coverage and Testing for Autonomous Vehicles with Reinforcement Learning

Adversarial Stress Test for Autonomous Vehicle Via Series Reinforcement Learning Tasks with Reward Shaping

Generating Critical Scenarios for Testing Automated Driving Systems

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Autonomous Highway Driving using Deep Reinforcement Learning

Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors

Building Safer Autonomous Agents by Leveraging Risky Driving Behavior Knowledge

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

Adaptive Deep Reinforcement Learning for Critical Boundary Scenario Generation

Game-Theoretic Adversarial Interaction-Based Critical Scenario Generation for Autonomous Vehicles

Adversarial and Reactive Traffic Agents for Realistic Driving Simulation

Adversarial Driving Behavior Generation Incorporating Human Risk Cognition for Autonomous Vehicle Evaluation

SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries

Critical concrete scenario generation using scenario-based falsification

A novel framework for adaptive stress testing of autonomous vehicles in multi-lane roads

AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles

Enhancing System-Level Safety in Mixed-Autonomy Platoon via Safe Reinforcement Learning