Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning.

Zhaorun Chen,Zhuokai Zhao,Tairan He,Binhao Chen,Xuhao Zhao,Liang Gong,Chengliang Liu
DOI: https://doi.org/10.48550/arxiv.2310.03379
2023-01-01
Abstract:Ensuring safety in Reinforcement Learning (RL), typically framed as aConstrained Markov Decision Process (CMDP), is crucial for real-worldexploration applications. Current approaches in handling CMDP struggle tobalance optimality and feasibility, as direct optimization methods cannotensure state-wise in-training safety, and projection-based methods correctactions inefficiently through lengthy iterations. To address these challenges,we propose Adaptive Chance-constrained Safeguards (ACS), an adaptive,model-free safe RL algorithm using the safety recovery rate as a surrogatechance constraint to iteratively ensure safety during exploration and afterachieving convergence. Theoretical analysis indicates that the relaxedprobabilistic constraint sufficiently guarantees forward invariance to the safeset. And extensive experiments conducted on both simulated and real-worldsafety-critical tasks demonstrate its effectiveness in enforcing safety (nearlyzero-violation) while preserving optimality (+23.8response in stochastic real-world settings.
What problem does this paper attempt to address?