RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes

Kyle Stachowicz,Sergey Levine
2024-05-08
Abstract:Reinforcement learning provides an appealing framework for robotic control due to its ability to learn expressive policies purely through real-world interaction. However, this requires addressing real-world constraints and avoiding catastrophic failures during training, which might severely impede both learning progress and the performance of the final policy. In many robotics settings, this amounts to avoiding certain "unsafe" states. The high-speed off-road driving task represents a particularly challenging instantiation of this problem: a high-return policy should drive as aggressively and as quickly as possible, which often requires getting close to the edge of the set of "safe" states, and therefore places a particular burden on the method to avoid frequent failures.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores how to achieve safety and high-performance behavior in high-risk environments within Reinforcement Learning (RL). Specifically, the paper proposes a method called **RACER (Risk-sensitive Actor Critic with Epistemic Robustness)**, aiming to address the following core issues: 1. **Avoiding Catastrophic Failures During Training**: - In real-world robot control, standard reinforcement learning methods may hinder learning progress due to catastrophic events (such as collisions or rollovers) during training, requiring costly human intervention to reset the robot. Therefore, the proposed method aims to reduce the occurrence of these catastrophic events. 2. **Balancing Performance and Safety**: - High-speed off-road driving tasks are particularly challenging examples because achieving high-reward strategies requires driving as fast as possible, often near the boundary of the so-called "safe state set," thus demanding more from the method to avoid frequent failures. The paper addresses this issue by combining risk-sensitive control and adaptive action space curriculum. 3. **Handling Uncertainty and Rare Events**: - In many high-performance settings, it is easy to obtain robust but low-performance behavior (such as low-speed driving) and then gradually improve performance over time. To this end, the paper proposes a risk-sensitive framework capable of modeling and effectively dealing with low-probability anomalous events, even when these events are uncertain. ### Summary The core contribution of the paper is the development of a method that enables real-world robotic systems to minimize the number of failures during training while learning high-performance behaviors (such as high-speed driving). By combining distributed reinforcement learning with the Conditional Value at Risk (CVaR) objective function, RACER can handle uncertainty and achieve a more robust learning process. Experimental results show that on actual autonomous vehicles, the RACER method can achieve higher final performance while significantly reducing the number of failures during training.