State-wise Safe Reinforcement Learning: A Survey

Weiye Zhao,Tairan He,Rui Chen,Tianhao Wei,Changliu Liu
2023-07-01
Abstract:Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.
Machine Learning,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the safety guarantee issue when reinforcement learning (RL) is applied to real - world tasks. Although RL has achieved remarkable success in simulated environments, in practical applications, especially those tasks that need to strictly abide by safety constraints, such as autonomous driving, robot operation, etc., ensuring the safety of algorithms remains a major challenge. Specifically, the paper focuses on state - wise constraints, that is, the safety constraints that must be satisfied at each time step, which is one of the most common and challenging constraint types in safe reinforcement learning (Safe RL). The main objective of the paper is to provide a comprehensive review, discussing existing methods for solving state - wise constraints, especially in the framework of State - wise Constrained Markov Decision Process (SCMDP), the connections, differences and trade - offs among these methods, including: 1. **Safety Assurance and Scalability**: Explore the performance of different methods in ensuring safety and algorithm scalability. 2. **Safety and Reward Performance**: Analyze how to maximize rewards while ensuring safety. 3. **Safety during Training and after Convergence**: Discuss the ability of methods to maintain safety during the training process and after the algorithm converges. In addition, the paper also summarizes the limitations of current methods and discusses possible future research directions. Through the exploration of these issues, the paper aims to promote the development of safe reinforcement learning under state - wise constraints, making it closer to practical applications in the real world.