CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening

Amar Kulkarni,Shangtong Zhang,Madhur Behl
2024-11-26
Abstract:Ensuring the safety of autonomous vehicles (AVs) requires identifying rare but critical failure cases that on-road testing alone cannot discover. High-fidelity simulations provide a scalable alternative, but automatically generating realistic and diverse traffic scenarios that can effectively stress test AV motion planners remains a key challenge. This paper introduces CRASH - Challenging Reinforcement-learning based Adversarial scenarios for Safety Hardening - an adversarial deep reinforcement learning framework to address this issue. First CRASH can control adversarial Non Player Character (NPC) agents in an AV simulator to automatically induce collisions with the Ego vehicle, falsifying its motion planner. We also propose a novel approach, that we term safety hardening, which iteratively refines the motion planner by simulating improvement scenarios against adversarial agents, leveraging the failure cases to strengthen the AV stack. CRASH is evaluated on a simplified two-lane highway scenario, demonstrating its ability to falsify both rule-based and learning-based planners with collision rates exceeding 90%. Additionally, safety hardening reduces the Ego vehicle's collision rate by 26%. While preliminary, these results highlight RL-based safety hardening as a promising approach for scenario-driven simulation testing for autonomous vehicles.
Machine Learning,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the safety issue of autonomous vehicles (AVs) in the face of rare but critical failure cases. Specifically, the paper points out that the current method of discovering these failure cases through actual road tests has limitations because these situations are very rare and difficult to detect through routine tests. To address this challenge, the paper proposes a framework named CRASH (Challenging Reinforcement - learning based Adversarial Scenarios for Safety Hardening), which uses an adversarial deep reinforcement - learning method to automatically generate realistic and diverse traffic scenarios that can effectively test AV motion planners. ### Main Problems 1. **Automatic Falsification**: - The goal is to discover scenarios that may lead to system failures by optimizing the behavior of non - player characters (NPCs) so that they collide with the autonomous vehicle (Ego vehicle). - Specifically, CRASH trains NPC agents to maximize the number of collisions with the Ego vehicle. The optimization objective is to find an optimal policy \(\pi^*_{\text{NPC}}(S_t, A_t)\) such that the cumulative number of collisions is maximized: \[ \pi^*_{\text{NPC}}(S_t, A_t)=\arg\max_{A_t}\left(\sum_{j = 0}^{E}\phi\right) \] where \(\phi\) is a binary indicator variable, which is 1 when a collision is detected and 0 otherwise; \(E\) represents the total number of episodes. 2. **Safety Hardening**: - Once the failure - causing scenarios are discovered through automatic falsification, the next step is to enhance the robustness of the Ego vehicle's motion planner through iterative training and reduce the collision rate. - The goal of safety hardening is to find an optimal policy \(\pi^*_{\text{Ego}}(S_t, A_t)\) such that the cumulative number of collisions is minimized: \[ \pi^*_{\text{Ego}}(S_t, A_t)=\arg\min_{A_t}\left(\sum_{e = 0}^{E}\phi\right) \] - This is a two - level optimization problem, in which the strategies of Ego and NPC are optimized in turn, and Ego tries to avoid collisions in the adversarial scenarios generated by NPC. ### Method Overview - **Automatic Falsification**: Use the Deep Q - Network (DQN) to train NPC agents and design an adversarial reward function, including collision rewards and time - continuous rewards based on Time - to - Collision (TTC), to guide the behavior of NPCs. - **Safety Hardening**: Gradually improve the robustness of the Ego vehicle through three methods: local safety hardening, uniform model - pool - based safety hardening, and priority model - pool - based safety hardening. ### Experimental Results - **Experimental Setup**: Use the highway - env simulator to test rule - based and learning - based motion planners. Each episode starts randomly selected from 8 initial configurations. - **Automatic Falsification Results**: The results show that NPCs using the continuous reward function (especially with weights \(w_1 = 400\), \(w_2=4\), \(w_3 = 1\)) can learn and generate adversarial scenarios more effectively under various initial configurations, and the collision rate is significantly increased. - **Safety Hardening Results**: Through safety hardening, the collision rate of the Ego vehicle is significantly reduced, especially after multiple training cycles, the collision rate is reduced by 26%. In conclusion, the CRASH framework provides a systematic method to improve the safety of autonomous vehicles by generating adversarial scenarios and iteratively improving motion planners.