Abstract:Reinforcement learning (RL) is promising for complicated stochastic nonlinear control problems. Without using a mathematical model, an optimal controller can be learned from data evaluated by certain performance criteria through trial-and-error. However, the data-based learning approach is notorious for not guaranteeing stability, which is the most fundamental property for any control system. In this paper, the classic Lyapunov's method is explored to analyze the uniformly ultimate boundedness stability (UUB) solely based on data without using a mathematical model. It is further shown how RL with UUB guarantee can be applied to control dynamic systems with safety constraints. Based on the theoretical results, both off-policy and on-policy learning algorithms are proposed respectively. As a result, optimal controllers can be learned to guarantee UUB of the closed-loop system both at convergence and during learning. The proposed algorithms are evaluated on a series of robotic continuous control tasks with safety constraints. In comparison with the existing RL algorithms, the proposed method can achieve superior performance in terms of maintaining safety. As a qualitative evaluation of stability, our method shows impressive resilience even in the presence of external disturbances.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to ensure the stability of the system in reinforcement learning control, especially for dynamic systems with safety constraints. Specifically, the goal of the paper is to develop a data - based method that can learn controllers through reinforcement learning (Reinforcement Learning, RL) while ensuring the uniformly ultimate boundedness (Uniformly Ultimate Boundedness, UUB) of the system. This solves the difficult problem that traditional RL methods are difficult to ensure stability without using a mathematical model, especially in control tasks that need to meet safety constraints. ### Main contributions of the paper 1. **Proposed a new data - based UUB theorem**: This theorem does not depend on the mathematical model of the system but analyzes the UUB stability of the system through sample data. This enables the stability of the control system to be ensured even when the system model is unknown. 2. **Extended the definition of UUB**: The paper extended the classical UUB definition to include cases with safety constraints, making it applicable to a wider range of control tasks. 3. **Designed practical algorithms**: Based on the theoretical results, the paper proposed two algorithms - the policy - based algorithm (Lyapunov - based Constrained Policy Optimization, LCPO) and the actor - critic - based algorithm (Lyapunov - based Soft Actor - Critic, LSAC), which can ensure the UUB stability of the system during the learning process. 4. **Experimental verification**: The paper verified the effectiveness of the proposed algorithms in a series of high - dimensional continuous control tasks, including the motion control of legged robots, robotic arms, and quadrotor drones. The experimental results show that the proposed algorithms are not only superior in performance to existing safe RL algorithms but also show stronger robustness in the face of perturbations. ### Specific methods for solving problems - **Construction of Lyapunov function**: The paper uses the Lyapunov function to prove the stability of the system. To make the Lyapunov function applicable to the reinforcement learning framework, the Lyapunov critic function (Lyapunov Critic Function, Lc) was introduced, and Lc was updated by minimizing the objective function. - **Verification of UUB conditions**: The UUB condition (3) was verified through data sampling and the Monte Carlo method to ensure that the system always satisfies UUB stability during the learning process. - **Algorithm design**: Two algorithms, LCPO and LSAC, were designed, which are respectively applicable to the policy optimization and actor - critic frameworks. These algorithms ensure that the UUB conditions are met during the learning process by adjusting the controller parameters, thereby ensuring the stability and safety of the system. ### Conclusion By proposing new theoretical results and practical algorithms, the paper successfully solved the difficult problem of ensuring system stability in reinforcement learning control, especially in dynamic systems with safety constraints. This provides important theoretical and technical support for applying reinforcement learning to actual engineering control tasks.

Reinforcement Learning Control of Constrained Dynamic Systems with Uniformly Ultimate Boundedness Stability Guarantee

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Actor-Critic Reinforcement Learning for Control With Stability Guarantee

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Robust Safe Reinforcement Learning Control of Unknown Continuous-Time Nonlinear Systems with State Constraints and Disturbances

Stability-certified reinforcement learning: A control-theoretic perspective

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

Stochastic Reinforcement Learning with Stability Guarantees for Control of Unknown Nonlinear Systems

Closed‐loop stability analysis of deep reinforcement learning controlled systems with experimental validation

Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions

Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions

Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning

Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions

Lyapunov-stable neural-network control

Stabilizing Neural Control Using Self-Learned Almost Lyapunov Critics

Reinforcement Learning Controller Design for Discrete-Time-Constrained Nonlinear Systems With Weight Initialization Method

Safety reinforcement learning control via transfer learning