Abstract:In this paper, we consider the problem of learning safe policies for probabilistic-constrained reinforcement learning (RL). Specifically, a safe policy or controller is one that, with high probability, maintains the trajectory of the agent in a given safe set. We establish a connection between this probabilisticconstrained setting and the cumulative-constrained formulation that is frequently explored in the existing literature. We provide theoretical bounds elucidating that the probabilistic-constrained setting offers a better trade-off in terms of optimality and safety (constraint satisfaction). The challenge encountered when dealing with the probabilistic constraints, as explored in this work, arises from the absence of explicit expressions for their gradients. Our prior work provides such an explicit gradient expression for probabilistic constraints which we term Safe Policy Gradient-REINFORCE (SPG-REINFORCE). In this work, we provide an improved gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE, which is substantiated by our theoretical results. A noteworthy aspect of both SPGs is their inherent algorithm independence, rendering them versatile for application across a range of policy-based algorithms. Furthermore, we propose a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe policies. It is subsequently followed by theoretical analyses that encompass the convergence of the algorithm, as well as the near-optimality and feasibility on average. In addition, we test the proposed approaches by a series of empirical experiments. These experiments aim to examine and analyze the inherent trade-offs between the optimality and safety, and serve to substantiate the efficacy of two SPGs, as well as our theoretical contributions.

CCPO: Conservatively Constrained Policy Optimization Using State Augmentation

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction

State-wise Constrained Policy Optimization

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Safe Reinforcement Learning Using Finite-Horizon Gradient-based Estimation

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

Constrained Update Projection Approach to Safe Policy Optimization

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Almost Surely Safe Exploration and Exploitation for Deep Reinforcement Learning with State Safety Estimation

CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

Augmented Proximal Policy Optimization for Safe Reinforcement Learning

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee

Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets

Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization

Probabilistic Constraint for Safety-Critical Reinforcement Learning

CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning

Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning.

Penalized Proximal Policy Optimization for Safe Reinforcement Learning