Safe Reinforcement Learning Using Robust Control Barrier Functions

Yousef Emam,Gennaro Notomista,Paul Glotfelter,Zsolt Kira,Magnus Egerstedt
DOI: https://doi.org/10.48550/arXiv.2110.05415
2022-06-23
Abstract:Reinforcement Learning (RL) has been shown to be effective in many scenarios. However, it typically requires the exploration of a sufficiently large number of state-action pairs, some of which may be unsafe. Consequently, its application to safety-critical systems remains a challenge. An increasingly common approach to address safety involves the addition of a safety layer that projects the RL actions onto a safe set of actions. In turn, a difficulty for such frameworks is how to effectively couple RL with the safety layer to improve the learning performance. In this paper, we frame safety as a differentiable robust-control-barrier-function layer in a model-based RL framework. Moreover, we also propose an approach to modularly learn the underlying reward-driven task, independent of safety constraints. We demonstrate that this approach both ensures safety and effectively guides exploration during training in a range of experiments, including zero-shot transfer when the reward is learned in a modular way.
Systems and Control,Artificial Intelligence,Machine Learning,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve safe exploration in Reinforcement Learning (RL), especially in applications in safety - critical systems. Although RL performs well in many scenarios, it usually needs to explore a large number of state - action pairs, some of which may be unsafe. Therefore, how to learn effectively while ensuring safety has become a challenge. The paper proposes a safety layer based on Robust Control Barrier Functions (RCBFs) and embeds it into a model - based RL framework. Specifically, this method solves the problem through the following points: 1. **Safety layer design**: A differentiable RCBF safety layer is proposed. This layer can be compatible with standard policy - gradient RL algorithms, ensuring real - time control synthesis and being able to handle a wide range of perturbation types even if the function is non - affine, and is applicable to multiple systems. 2. **Modular learning tasks**: A method is proposed to enable reward - driven tasks to be modularly learned independently of some constraint conditions. This helps the zero - shot transfer ability in different environments, for example, when the constraint conditions are different in the training and testing stages, such as a drone needs to stay within a certain distance from a safe operator. 3. **Improving learning efficiency**: - **Differentiable safety layer**: Utilize a differentiable optimization framework to allow back - propagating gradients through QP, thereby explicitly considering the output of the safety layer in the RL loss and improving the learning performance. - **Model - based RL**: When possible, use partially learned dynamics, reward functions, and RCBF constraints to generate short - horizon trajectories, further improving the sample efficiency of SAC - RCBF. Through these methods, the paper aims to ensure the safety of the system during the training process, effectively guide exploration, improve learning efficiency, and transfer ability in different environments. The experimental results verify the effectiveness of the proposed method, especially in terms of sample efficiency and zero - shot transfer tasks.