Abstract:Safety is the major consideration in controlling complex dynamical systems using reinforcement learning (RL), where the safety certificate can provide provable safety guarantee. A valid safety certificate is an energy function indicating that safe states are with low energy, and there exists a corresponding safe control policy that allows the energy function to always dissipate. The safety certificate and the safe control policy are closely related to each other and both challenging to synthesize. Therefore, existing learning-based studies treat either of them as prior knowledge to learn the other, which limits their applicability with general unknown dynamics. This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL. We do not rely on prior knowledge about either an available model-based controller or a perfect safety certificate. In particular, we formulate a loss function to optimize the safety certificate parameters by minimizing the occurrence of energy increases. By adding this optimization procedure as an outer loop to the Lagrangian-based constrained reinforcement learning (CRL), we jointly update the policy and safety certificate parameters and prove that they will converge to their respective local optima, the optimal safe policy and a valid safety certificate. We evaluate our algorithms on multiple safety-critical benchmark environments. The results show that the proposed algorithm learns provably safe policies with no constraint violation. The validity or feasibility of synthesized safety certificate is also verified numerically.

Reinforcement Learning with Ensemble Model Predictive Safety Certification

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions

Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards

Safe Reinforcement Learning on Autonomous Vehicles

Safe Model-Based Reinforcement Learning for Systems with Parametric Uncertainties

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

Learning to be Safe: Deep RL with a Safety Critic

Verifiably Safe Off-Model Reinforcement Learning

Hierarchical Framework for Interpretable and Probabilistic Model-Based Safe Reinforcement Learning

Automata Learning meets Shielding

Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

Evaluating Model-free Reinforcement Learning Toward Safety-critical Tasks

Iterative Batch Reinforcement Learning via Safe Diversified Model-based Policy Search

Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning