Abstract:We investigate the important problem of certifying stability of reinforcement learning policies when interconnected with nonlinear dynamical systems. We show that by regulating the input-output gradients of policies, strong guarantees of robust stability can be obtained based on a proposed semidefinite programming feasibility problem. The method is able to certify a large set of stabilizing controllers by exploiting problem-specific structures; furthermore, we analyze and establish its (non)conservatism. Empirical evaluations on two decentralized control tasks, namely multi-flight formation and power system frequency regulation, demonstrate that the reinforcement learning agents can have high performance within the stability-certified parameter space, and also exhibit stable learning behaviors in the long run.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to ensure the stability of the system when the Reinforcement Learning (RL) policy is interconnected with the nonlinear dynamical system. Specifically, the researchers focus on how to obtain strong guarantees of the robust stability of the system by adjusting the input - output gradient of the policy while using deep neural networks as controllers to optimize control performance. This involves the formulation of a semidefinite programming feasibility problem to ensure that the stability of the interconnected system can be analyzed or certified during the RL exploration and deployment phases. ### Background of the Paper - **Reinforcement Learning**: Use (deep) neural networks to solve complex decision - making and control problems. - **Robust Control**: For mission - critical systems (such as self - driving cars and power grids), safety is of utmost importance. Therefore, analyzing or certifying the stability of the interconnected system during the RL exploration and deployment phases is a fundamental problem, but it is challenging due to its dynamic and non - convex nature. ### System Description The general continuous - time dynamical system is considered in the paper: \[ \dot{x}(t) = f_t(x(t), u(t)), \] where \( x(t) \in \mathbb{R}^{n_s} \) is the state and \( u(t) \in \mathbb{R}^{n_a} \) is the control action. For the stability analysis, the following form of the dynamical system is specifically studied: \[ f_t(x(t)) = Ax(t) + Bu(t) + g_t(x(t)), \] where \( A \in \mathbb{R}^{n_s\times n_s} \) is a stable linear time - invariant (LTI) component (i.e., the real part of each eigenvalue is strictly less than zero), \( B \in \mathbb{R}^{n_s\times n_a} \) is the control matrix, and \( g_t \) is a slowly time - varying component that allows for nonlinearity and uncertainty. ### Main Contributions 1. **Utilization of Gradient Information**: A method for stability certificates is proposed by utilizing the gradient information of the policy \( \pi_t(y(t); \theta_t) \). This method can be extracted in real - time and is applicable to a large class of nonlinear controllers for performance optimization. 2. **Gradient Bounds**: The gradient bounds \( \xi \) and \( \xi \) are defined. These bounds can ensure that the system is stable as long as the RL policy remains within these "safe sets". 3. **Semidefinite Programming**: These gradient bounds can be efficiently computed by solving a semidefinite programming problem. 4. **Theoretical Analysis**: The conservatism of the certificate conditions is analyzed, and it is proved that it is necessary in some cases. ### Experimental Verification The paper conducts experimental verification on two decentralized control tasks: multi - vehicle formation and power system frequency regulation. The experimental results show that within the stability certificate parameter space, the RL agents can exhibit high performance and also show stable characteristics during the long - term learning process. ### Conclusion This research provides theoretical guarantees for the use of reinforcement learning in actual control systems, especially in mission - critical systems that require high stability and safety. By introducing new quadratic constraints, the research significantly expands the application possibilities of stability - certificate - based reinforcement learning in large - scale nonlinear systems.

Stability-certified reinforcement learning: A control-theoretic perspective

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Stochastic Reinforcement Learning with Stability Guarantees for Control of Unknown Nonlinear Systems

Stabilizing reinforcement learning control: A modular framework for optimizing over all stable behavior

Reinforcement Learning Control of Constrained Dynamic Systems with Uniformly Ultimate Boundedness Stability Guarantee

Actor-Critic Reinforcement Learning for Control With Stability Guarantee

A modular framework for stabilizing deep reinforcement learning control

H_∞ Model-free Reinforcement Learning with Robust Stability Guarantee

Reinforcement Learning Policies in Continuous-Time Linear Systems

Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

Distributionally Robust Policy and Lyapunov-Certificate Learning

Learning Provably Stabilizing Neural Controllers for Discrete-Time Stochastic Systems

Safe Reinforcement Learning via a Model-Free Safety Certifier

Stability Constrained Reinforcement Learning for Decentralized Real-Time Voltage Control

Lyapunov-stable neural-network control

Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems Via Off-Policy Reinforcement Learning.

Stability-Certified Learning of Control Systems with Quadratic Nonlinearities

Stabilizing Neural Control Using Self-Learned Almost Lyapunov Critics

Lyapunov-based reinforcement learning for distributed control with stability guarantee

A note on stabilizing reinforcement learning

Reinforcement Learning of Structured Control for Linear Systems with Unknown State Matrix