Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions

Jason Choi,Fernando Castañeda,Claire J. Tomlin,Koushil Sreenath
DOI: https://doi.org/10.48550/arXiv.2004.07584
2020-06-05
Abstract:In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model-based CBF-CLF-QP, resulting in the Reinforcement Learning-based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.
Systems and Control,Machine Learning,Robotics
What problem does this paper attempt to address?
This paper attempts to solve the problem of model uncertainty in safety - critical control. Specifically, the paper proposes a new framework based on Reinforcement Learning (RL) for learning and compensating for model uncertainty in the Control Lyapunov Function (CLF) and Control Barrier Function (CBF), as well as other control - affine dynamic constraints in Quadratic Program (QP). This framework aims to combine the advantages of data - driven methods with the stability and safety guarantees of classical model - based control methods to address safety - critical control problems in highly uncertain dynamic systems. ### The main contributions of the paper include: 1. **Proposing a new RL framework**: This framework can simultaneously learn model uncertainty in CLF, CBF, and other control - affine dynamic constraints in one learning process. 2. **Expanding the scope of application of the method**: This method can be applied to high - relative - degree outputs and control barrier functions. 3. **Learning the uncertainty of parameterized CBF**: It depends not only on the state but also on other parameters. 4. **Numerical verification**: The effectiveness of the method is verified on an under - actuated nonlinear hybrid system with significant model uncertainty (such as a bipedal robot walking on randomly spaced pedals). ### The structure of the paper: - **Introduction**: Introduces the research background and motivation, emphasizing the importance of combining learning methods and classical control theory. - **Background knowledge**: Explains in detail the basic concepts of input - output linearization, CLF - based quadratic programming, and CBF - based quadratic programming. - **Method**: Describes step by step how to learn model uncertainty in CLF and CBF through RL and proposes the RL - CBF - CLF - QP framework. - **Experimental setup**: Describes the simulation setup on the bipedal robot, including two different simulation scenarios. - **Results**: Presents the experimental results under different model uncertainty conditions, verifying the effectiveness and robustness of the proposed method. ### Key technical details: - **Input - output linearization**: Linearizes the input - output dynamics of the system through control inputs. - **CLF and CBF**: Used to ensure the stability and safety of the system respectively. - **RL framework**: Uses the Deep Deterministic Policy Gradient (DDPG) algorithm to train RL agents to learn model uncertainty. - **Quadratic programming**: Combines the learned uncertainty in the optimization problem to ensure that the control input satisfies safety and stability constraints. ### Experimental results: - **Walking on flat ground**: Under model uncertainty conditions, the proposed RL method can maintain the stable walking of the robot and satisfy the friction constraints. - **Walking on pedals**: When walking on randomly spaced pedals, the RL - CBF - CLF - QP method can successfully place the robot's feet safely and adapt to additional uncertainties (such as increased load). In conclusion, by combining RL and classical control theory, this paper proposes an effective method to deal with the problem of model uncertainty in safety - critical control systems, which has important theoretical and application values.