Abstract:Reinforcement learning (RL) has been successfully applied to a variety of robotics applications, where it outperforms classical methods. However, the safety aspect of RL and the transfer to the real world remain an open challenge. A prominent field for tackling this challenge and ensuring the safety of the agents during training and execution is safe reinforcement learning. Safe RL can be achieved through constrained RL and safe exploration approaches. The former learns the safety constraints over the course of training to achieve a safe behavior by the end of training, at the cost of high number of collisions at earlier stages of the training. The latter offers robust safety by enforcing the safety constraints as hard constraints, which prevents collisions but hinders the exploration of the RL agent, resulting in lower rewards and poor performance. To overcome those drawbacks, we propose a novel safety shield, that combines the robustness of the optimization-based controllers with the long prediction capabilities of the RL agents, allowing the RL agent to adaptively tune the parameters of the controller. Our approach is able to improve the exploration of the RL agents for navigation tasks, while minimizing the number of collisions. Experiments in simulation show that our approach outperforms state-of-the-art baselines in the reached goals-to-collisions ratio in different challenging environments. The goals-to-collisions ratio metrics emphasizes the importance of minimizing the number of collisions, while learning to accomplish the task. Our approach achieves a higher number of reached goals compared to the classic safety shields and fewer collisions compared to constrained RL approaches. Finally, we demonstrate the performance of the proposed method in a real-world experiment.

Safe Reinforcement Learning with Nonlinear Dynamics via Model Predictive Shielding

Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning

Safe Reinforcement Learning via Shielding

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

Safe Multi-Agent Reinforcement Learning Via Dynamic Shielding

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Safe Reinforcement Learning via Probabilistic Shields

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Shielded Planning Guided Data-Efficient and Safe Reinforcement Learning

Learning-Based Shielding for Safe Autonomy under Unknown Dynamics

Barrier-Certified Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Approximate Model-Based Shielding for Safe Reinforcement Learning

Safe Reinforcement Learning via Probabilistic Logic Shields

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Human-Feedback Shield Synthesis for Perceived Safety in Deep Reinforcement Learning

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

Shielded Reinforcement Learning for Hybrid Systems

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

A Dynamic Safety Shield for Safe and Efficient Reinforcement Learning of Navigation Tasks

Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments

Safe Controller for Output Feedback Linear Systems using Model-Based Reinforcement Learning