Safe Reinforcement Learning Using Robust Control Barrier Functions

Yousef Emam,Gennaro Notomista,Paul Glotfelter,Zsolt Kira,Magnus Egerstedt

DOI: https://doi.org/10.48550/arXiv.2110.05415

2022-06-23

Abstract:Reinforcement Learning (RL) has been shown to be effective in many scenarios. However, it typically requires the exploration of a sufficiently large number of state-action pairs, some of which may be unsafe. Consequently, its application to safety-critical systems remains a challenge. An increasingly common approach to address safety involves the addition of a safety layer that projects the RL actions onto a safe set of actions. In turn, a difficulty for such frameworks is how to effectively couple RL with the safety layer to improve the learning performance. In this paper, we frame safety as a differentiable robust-control-barrier-function layer in a model-based RL framework. Moreover, we also propose an approach to modularly learn the underlying reward-driven task, independent of safety constraints. We demonstrate that this approach both ensures safety and effectively guides exploration during training in a range of experiments, including zero-shot transfer when the reward is learned in a modular way.

Systems and Control,Artificial Intelligence,Machine Learning,Robotics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve safe exploration in Reinforcement Learning (RL), especially in applications in safety - critical systems. Although RL performs well in many scenarios, it usually needs to explore a large number of state - action pairs, some of which may be unsafe. Therefore, how to learn effectively while ensuring safety has become a challenge. The paper proposes a safety layer based on Robust Control Barrier Functions (RCBFs) and embeds it into a model - based RL framework. Specifically, this method solves the problem through the following points: 1. **Safety layer design**: A differentiable RCBF safety layer is proposed. This layer can be compatible with standard policy - gradient RL algorithms, ensuring real - time control synthesis and being able to handle a wide range of perturbation types even if the function is non - affine, and is applicable to multiple systems. 2. **Modular learning tasks**: A method is proposed to enable reward - driven tasks to be modularly learned independently of some constraint conditions. This helps the zero - shot transfer ability in different environments, for example, when the constraint conditions are different in the training and testing stages, such as a drone needs to stay within a certain distance from a safe operator. 3. **Improving learning efficiency**: - **Differentiable safety layer**: Utilize a differentiable optimization framework to allow back - propagating gradients through QP, thereby explicitly considering the output of the safety layer in the RL loss and improving the learning performance. - **Model - based RL**: When possible, use partially learned dynamics, reward functions, and RCBF constraints to generate short - horizon trajectories, further improving the sample efficiency of SAC - RCBF. Through these methods, the paper aims to ensure the safety of the system during the training process, effectively guide exploration, improve learning efficiency, and transfer ability in different environments. The experimental results verify the effectiveness of the proposed method, especially in terms of sample efficiency and zero - shot transfer tasks.

Safe Reinforcement Learning Using Robust Control Barrier Functions

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Model-Free Safe Reinforcement Learning Through Neural Barrier Certificate

Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training Time Safety Violations

Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Safe Reinforcement Learning for Dynamical Systems Using Barrier Certificates

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Safe and Efficient Reinforcement Learning Using Disturbance-Observer-Based Control Barrier Functions

Safe Reinforcement Learning with Dual Robustness

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

Safe Reinforcement Learning via a Model-Free Safety Certifier

Safe Reinforcement Learning Using Black-Box Reachability Analysis

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Optimal control barrier functions for RL based safe powertrain control

Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

Safe Inverse Reinforcement Learning via Control Barrier Function

Stable and Safe Reinforcement Learning via a Barrier-Lyapunov Actor-Critic Approach

Safe Controller for Output Feedback Linear Systems using Model-Based Reinforcement Learning

Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems