Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models

Weiye Zhao,Tairan He,Changliu Liu
2023-10-18
Abstract:Safety is one of the biggest concerns to applying reinforcement learning (RL) to the physical world. In its core part, it is challenging to ensure RL agents persistently satisfy a hard state constraint without white-box or black-box dynamics models. This paper presents an integrated model learning and safe control framework to safeguard any agent, where its dynamics are learned as Gaussian processes. The proposed theory provides (i) a novel method to construct an offline dataset for model learning that best achieves safety requirements; (ii) a parameterization rule for safety index to ensure the existence of safe control; (iii) a safety guarantee in terms of probabilistic forward invariance when the model is learned using the aforementioned dataset. Simulation results show that our framework guarantees almost zero safety violation on various continuous control tasks.
Robotics
What problem does this paper attempt to address?
This paper attempts to solve the safety problems when applying Reinforcement Learning (RL) in the physical world. Specifically, the paper focuses on how to ensure that RL agents can continuously meet hard state constraints without relying on white - box or black - box dynamic models. To achieve this goal, the paper proposes an integrated model - learning and safety - control framework. This framework uses Gaussian Process Models to learn environmental dynamics and ensures safety through the following contributions: 1. **Constructing an offline dataset**: A new method is proposed to construct an offline dataset for model learning, which can best meet the safety requirements. 2. **Designing a safety index**: A design rule is provided to construct a safety index to ensure the existence of safe control under control limits. 3. **Probabilistic safety assurance**: When using the above - mentioned dataset to learn the model, the safety assurance of probabilistic forward invariance is provided. Through these methods, the paper shows that its framework has almost achieved zero safety violations in various continuous - control tasks.