Abstract:Reinforcement learning provides an appealing framework for robotic control due to its ability to learn expressive policies purely through real-world interaction. However, this requires addressing real-world constraints and avoiding catastrophic failures during training, which might severely impede both learning progress and the performance of the final policy. In many robotics settings, this amounts to avoiding certain "unsafe" states. The high-speed off-road driving task represents a particularly challenging instantiation of this problem: a high-return policy should drive as aggressively and as quickly as possible, which often requires getting close to the edge of the set of "safe" states, and therefore places a particular burden on the method to avoid frequent failures.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily explores how to achieve safety and high-performance behavior in high-risk environments within Reinforcement Learning (RL). Specifically, the paper proposes a method called **RACER (Risk-sensitive Actor Critic with Epistemic Robustness)**, aiming to address the following core issues: 1. **Avoiding Catastrophic Failures During Training**: - In real-world robot control, standard reinforcement learning methods may hinder learning progress due to catastrophic events (such as collisions or rollovers) during training, requiring costly human intervention to reset the robot. Therefore, the proposed method aims to reduce the occurrence of these catastrophic events. 2. **Balancing Performance and Safety**: - High-speed off-road driving tasks are particularly challenging examples because achieving high-reward strategies requires driving as fast as possible, often near the boundary of the so-called "safe state set," thus demanding more from the method to avoid frequent failures. The paper addresses this issue by combining risk-sensitive control and adaptive action space curriculum. 3. **Handling Uncertainty and Rare Events**: - In many high-performance settings, it is easy to obtain robust but low-performance behavior (such as low-speed driving) and then gradually improve performance over time. To this end, the paper proposes a risk-sensitive framework capable of modeling and effectively dealing with low-probability anomalous events, even when these events are uncertain. ### Summary The core contribution of the paper is the development of a method that enables real-world robotic systems to minimize the number of failures during training while learning high-performance behaviors (such as high-speed driving). By combining distributed reinforcement learning with the Conditional Value at Risk (CVaR) objective function, RACER can handle uncertainty and achieve a more robust learning process. Experimental results show that on actual autonomous vehicles, the RACER method can achieve higher final performance while significantly reducing the number of failures during training.

RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes

FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Risk Averse Robust Adversarial Reinforcement Learning

Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving

Residual Policy Learning Facilitates Efficient Model-Free Autonomous Racing

Constrained Residual Race: an Efficient Hybrid Controller for Autonomous Racing

Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

Multi-policy Soft Actor-Critic Reinforcement Learning for Autonomous Racing

Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning

Vehicle Extreme Control Based on Offline Reinforcement Leaning

Bypassing the Simulation-to-reality Gap: Online Reinforcement Learning using a Supervisor

Learning from Simulation, Racing in Reality

Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control

Towards Robust Decision-Making for Autonomous Driving on Highway

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Safe Reinforcement Learning for a Robot Being Pursued but with Objectives Covering More Than Capture-avoidance.

Learning to Drive from a World on Rails

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles