Abstract:Safety is the major consideration in controlling complex dynamical systems using reinforcement learning (RL), where the safety certificate can provide provable safety guarantee. A valid safety certificate is an energy function indicating that safe states are with low energy, and there exists a corresponding safe control policy that allows the energy function to always dissipate. The safety certificate and the safe control policy are closely related to each other and both challenging to synthesize. Therefore, existing learning-based studies treat either of them as prior knowledge to learn the other, which limits their applicability with general unknown dynamics. This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL. We do not rely on prior knowledge about either an available model-based controller or a perfect safety certificate. In particular, we formulate a loss function to optimize the safety certificate parameters by minimizing the occurrence of energy increases. By adding this optimization procedure as an outer loop to the Lagrangian-based constrained reinforcement learning (CRL), we jointly update the policy and safety certificate parameters and prove that they will converge to their respective local optima, the optimal safe policy and a valid safety certificate. We evaluate our algorithms on multiple safety-critical benchmark environments. The results show that the proposed algorithm learns provably safe policies with no constraint violation. The validity or feasibility of synthesized safety certificate is also verified numerically.

Reachability Analysis-based Safety-Critical Control using Online Fixed-Time Reinforcement Learning

Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Lagrangian-based online safe reinforcement learning for state-constrained systems

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Safe adaptive output‐feedback optimal control of a class of linear systems

Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Train Trajectory Optimization with High-Risk State Space Boundaries: A Safe Reinforcement Learning Approach

Safe Controller for Output Feedback Linear Systems using Model-Based Reinforcement Learning

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time

Reachability Constrained Reinforcement Learning.

Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach

Safe Reinforcement Learning Using Black-Box Reachability Analysis

Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning

Safe Online Dynamics Learning with Initially Unknown Models and Infeasible Safety Certificates

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions