Abstract:Satisfying safety constraints is a priority concern when solving optimal control problems (OCPs). Due to the existence of infeasibility phenomenon, where a constraint-satisfying solution cannot be found, it is necessary to identify a feasible region before implementing a policy. Existing feasibility theories built for model predictive control (MPC) only consider the feasibility of optimal policy. However, reinforcement learning (RL), as another important control method, solves the optimal policy in an iterative manner, which comes with a series of non-optimal intermediate policies. Feasibility analysis of these non-optimal policies is also necessary for iteratively improving constraint satisfaction; but that is not available under existing MPC feasibility theories. This paper proposes a feasibility theory that applies to both MPC and RL by filling in the missing part of feasibility analysis for an arbitrary policy. The basis of our theory is to decouple policy solving and implementation into two temporal domains: virtual-time domain and real-time domain. This allows us to separately define initial and endless, state and policy feasibility, and their corresponding feasible regions. Based on these definitions, we analyze the containment relationships between different feasible regions, which enables us to describe the feasible region of an arbitrary policy. We further provide virtual-time constraint design rules along with a practical design tool called feasibility function that helps to achieve the maximum feasible region. We review most of existing constraint formulations and point out that they are essentially applications of feasibility functions in different forms. We demonstrate our feasibility theory by visualizing different feasible regions under both MPC and RL policies in an emergency braking control task.

Synthesizing Control Barrier Functions with Feasible Region Iteration for Safe Reinforcement Learning

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Feasible Policy Iteration

Safe Inverse Reinforcement Learning via Control Barrier Function

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Model-Free Safe Reinforcement Learning Through Neural Barrier Certificate

Barrier Certified Safety Learning Control: When Sum-of-Square Programming Meets Reinforcement Learning

Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

The Feasibility of Constrained Reinforcement Learning Algorithms: A Tutorial Study

Safe Reinforcement Learning for Dynamical Systems Using Barrier Certificates

Safe and Efficient Reinforcement Learning Using Disturbance-Observer-Based Control Barrier Functions

Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Reachability Constrained Reinforcement Learning.

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Safe Reinforcement Learning Using Robust Control Barrier Functions

Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning

Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training Time Safety Violations