Abstract:Safety is a critical concern when applying reinforcement learning (RL) to real-world control tasks. However, existing safe RL works either only consider expected safety constraint violations and fail to maintain safety guarantees, or use overly conservative safety certificate tools borrowed from safe control theory, which sacrifices reward optimization and relies on analytic system models. This letter proposes a model-free safe RL algorithm that achieves near-zero constraint violations with high rewards. Our key idea is to jointly learn a policy and a neural barrier certificate under stepwise state constraint setting. The barrier certificate is learned in a model-free manner by minimizing the violations of appropriate barrier properties on transition data collected by the policy. We extend the single-step invariant property of the barrier certificate to a multi-step version and construct the corresponding multi-step invariant loss. This loss balances the bias and variance of the barrier certificate and enhances both the safety and performance of the policy. The policy is optimized under the constraint of the multi-step invariant property using the Lagrangian method. We optimize the policy in a model-free manner by introducing an importance sampling weight in the constraint. We test our algorithm on multiple problems, including classic control tasks, robot collision avoidance, and autonomous driving. Results show that our algorithm achieves near-zero constraint violations and high performance compared to the baselines. Moreover, the learned barrier certificates successfully identify the feasible regions on multiple tasks.

Sablas: Learning Safe Control for Black-Box Dynamical Systems

SABLAS: Learning Safe Control for Black-box Dynamical Systems

Safe Reinforcement Learning for Dynamical Systems Using Barrier Certificates

Learning Observation-Based Certifiable Safe Policy for Decentralized Multi-Robot Navigation

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Learning for Safety-Critical Control with Control Barrier Functions

Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Safe Control With Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction Methods for Robotics and Control

State-action control barrier functions: Imposing safety on learning-based control with low online computational costs

Learning a Better Control Barrier Function Under Uncertain Dynamics

Safe Online Dynamics Learning with Initially Unknown Models and Infeasible Safety Certificates

Barrier Certified Safety Learning Control: When Sum-of-Square Programming Meets Reinforcement Learning

Model-Free Safe Reinforcement Learning Through Neural Barrier Certificate

Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning

Learning Local Control Barrier Functions for Hybrid Systems

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Safety-Aware Preference-Based Learning for Safety-Critical Control

Safe Reinforcement Learning Using Robust Control Barrier Functions

Transfer of Safety Controllers Through Learning Deep Inverse Dynamics Model