Stability-certified reinforcement learning: A control-theoretic perspective

Ming Jin,Javad Lavaei
DOI: https://doi.org/10.48550/arXiv.1810.11505
2018-10-27
Abstract:We investigate the important problem of certifying stability of reinforcement learning policies when interconnected with nonlinear dynamical systems. We show that by regulating the input-output gradients of policies, strong guarantees of robust stability can be obtained based on a proposed semidefinite programming feasibility problem. The method is able to certify a large set of stabilizing controllers by exploiting problem-specific structures; furthermore, we analyze and establish its (non)conservatism. Empirical evaluations on two decentralized control tasks, namely multi-flight formation and power system frequency regulation, demonstrate that the reinforcement learning agents can have high performance within the stability-certified parameter space, and also exhibit stable learning behaviors in the long run.
Systems and Control,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to ensure the stability of the system when the Reinforcement Learning (RL) policy is interconnected with the nonlinear dynamical system. Specifically, the researchers focus on how to obtain strong guarantees of the robust stability of the system by adjusting the input - output gradient of the policy while using deep neural networks as controllers to optimize control performance. This involves the formulation of a semidefinite programming feasibility problem to ensure that the stability of the interconnected system can be analyzed or certified during the RL exploration and deployment phases. ### Background of the Paper - **Reinforcement Learning**: Use (deep) neural networks to solve complex decision - making and control problems. - **Robust Control**: For mission - critical systems (such as self - driving cars and power grids), safety is of utmost importance. Therefore, analyzing or certifying the stability of the interconnected system during the RL exploration and deployment phases is a fundamental problem, but it is challenging due to its dynamic and non - convex nature. ### System Description The general continuous - time dynamical system is considered in the paper: \[ \dot{x}(t) = f_t(x(t), u(t)), \] where \( x(t) \in \mathbb{R}^{n_s} \) is the state and \( u(t) \in \mathbb{R}^{n_a} \) is the control action. For the stability analysis, the following form of the dynamical system is specifically studied: \[ f_t(x(t)) = Ax(t) + Bu(t) + g_t(x(t)), \] where \( A \in \mathbb{R}^{n_s\times n_s} \) is a stable linear time - invariant (LTI) component (i.e., the real part of each eigenvalue is strictly less than zero), \( B \in \mathbb{R}^{n_s\times n_a} \) is the control matrix, and \( g_t \) is a slowly time - varying component that allows for nonlinearity and uncertainty. ### Main Contributions 1. **Utilization of Gradient Information**: A method for stability certificates is proposed by utilizing the gradient information of the policy \( \pi_t(y(t); \theta_t) \). This method can be extracted in real - time and is applicable to a large class of nonlinear controllers for performance optimization. 2. **Gradient Bounds**: The gradient bounds \( \xi \) and \( \xi \) are defined. These bounds can ensure that the system is stable as long as the RL policy remains within these "safe sets". 3. **Semidefinite Programming**: These gradient bounds can be efficiently computed by solving a semidefinite programming problem. 4. **Theoretical Analysis**: The conservatism of the certificate conditions is analyzed, and it is proved that it is necessary in some cases. ### Experimental Verification The paper conducts experimental verification on two decentralized control tasks: multi - vehicle formation and power system frequency regulation. The experimental results show that within the stability certificate parameter space, the RL agents can exhibit high performance and also show stable characteristics during the long - term learning process. ### Conclusion This research provides theoretical guarantees for the use of reinforcement learning in actual control systems, especially in mission - critical systems that require high stability and safety. By introducing new quadratic constraints, the research significantly expands the application possibilities of stability - certificate - based reinforcement learning in large - scale nonlinear systems.