Abstract:Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward shaping, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step remains a formidable challenge. In contrast, existing work in the field of safe control provides guarantees on persistent satisfaction of hard safety constraints. However, these methods require explicit analytical system dynamics models to synthesize safe control, which are typically inaccessible in DRL settings. In this paper, we present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents that ensure provable safety throughout training. The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function (e.g., a digital twin simulator). Moreover, we theoretically prove that the implicit safe set algorithm guarantees finite time convergence to the safe set and forward invariance for both continuous-time and discrete-time systems. We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95\% \pm 9\%$ cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing.

Safety Margins for Reinforcement Learning

Criticality and Safety Margins for Reinforcement Learning

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning.

Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards

Provable Safe Reinforcement Learning with Binary Feedback

Evaluating Model-free Reinforcement Learning Toward Safety-critical Tasks

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

Shielding Atari Games with Bounded Prescience

Learning Adaptive Safety for Multi-Agent Systems

AI Safety Gridworlds

Runtime Safety Assurance Using Reinforcement Learning

Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications

Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning

Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Safe Reinforcement Learning by Imagining the Near Future

Adaptive Safety Margin Estimation for Safe Real-Time Replanning under Time-Varying Disturbance

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning