ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Yarden As,Bhavya Sukhija,Lenart Treven,Carmelo Sferrazza,Stelian Coros,Andreas Krause
2024-10-12
Abstract:Reinforcement learning (RL) is ubiquitous in the development of modern AI systems. However, state-of-the-art RL agents require extensive, and potentially unsafe, interactions with their environments to learn effectively. These limitations confine RL agents to simulated environments, hindering their ability to learn directly in real-world settings. In this work, we present ActSafe, a novel model-based RL algorithm for safe and efficient exploration. ActSafe learns a well-calibrated probabilistic model of the system and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics, while enforcing pessimism w.r.t. the safety constraints. Under regularity assumptions on the constraints and dynamics, we show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time. In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements and enables safe exploration even in high-dimensional settings such as visual control. We empirically show that ActSafe obtains state-of-the-art performance in difficult exploration tasks on standard safe deep RL benchmarks while ensuring safety during learning.
Machine Learning,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to conduct effective exploration under the premise of ensuring safety in Reinforcement Learning (RL). Specifically, existing RL algorithms need to have a large number of interactions with the environment during the learning process. These interactions can be both time - consuming and have potential safety risks, especially when these algorithms are applied to high - risk scenarios in the real world, such as learning driving strategies in autonomous vehicles. These problems limit the application scope of RL algorithms, making them mainly confined to simulated environments and difficult to learn directly in the real environment. To overcome these challenges, this paper proposes a new model named A CTSAFE, which is a model - based RL algorithm aiming to achieve efficient and safe exploration. A CTSAFE solves the problem in the following ways: 1. **Learning the Probabilistic Model of the System**: A CTSAFE learns a probabilistic model of system dynamics and conducts optimistic planning based on this model to explore unknown dynamic uncertainties. At the same time, it maintains a pessimistic attitude towards safety constraints to ensure that these constraints will not be violated throughout the learning process. 2. **Theoretical Guarantees**: Under certain constraints and dynamic assumptions, A CTSAFE can guarantee the safety during the learning process and obtain a near - optimal policy within a limited time. This provides theoretical support for the safety and efficiency of RL algorithms in real - world applications. 3. **Practical Variants**: The paper also proposes a practical variant of A CTSAFE, which combines the latest model - based RL techniques and can achieve safe exploration in high - dimensional settings such as visual control tasks. Through these methods, A CTSAFE aims to bridge the gap between simulated environments and the real world, enabling RL algorithms to learn directly in the real environment while ensuring safety and sample efficiency. This not only advances the theoretical frontier of safe RL methods but also demonstrates strong performance in practical applications.