Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

Ehsan Sabouni,H.M. Sabbir Ahmad,Vittorio Giammarino,Christos G. Cassandras,Ioannis Ch. Paschalidis,Wenchao Li
2024-03-26
Abstract:Optimal control methods provide solutions to safety-critical problems but easily become intractable. Control Barrier Functions (CBFs) have emerged as a popular technique that facilitates their solution by provably guaranteeing safety, through their forward invariance property, at the expense of some performance loss. This approach involves defining a performance objective alongside CBF-based safety constraints that must always be enforced. Unfortunately, both performance and solution feasibility can be significantly impacted by two key factors: (i) the selection of the cost function and associated parameters, and (ii) the calibration of parameters within the CBF-based constraints, which capture the trade-off between performance and conservativeness. %as well as infeasibility. To address these challenges, we propose a Reinforcement Learning (RL)-based Receding Horizon Control (RHC) approach leveraging Model Predictive Control (MPC) with CBFs (MPC-CBF). In particular, we parameterize our controller and use bilevel optimization, where RL is used to learn the optimal parameters while MPC computes the optimal control input. We validate our method by applying it to the challenging automated merging control problem for Connected and Automated Vehicles (CAVs) at conflicting roadways. Results demonstrate improved performance and a significant reduction in the number of infeasible cases compared to traditional heuristic approaches used for tuning CBF-based controllers, showcasing the effectiveness of the proposed method.
Systems and Control,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to solve the control problems in safety - critical systems, especially the safety control problems of Connected and Automated Vehicles (CAVs) at traffic network conflict points (such as merging lanes, unsignalized intersections, roundabouts, etc.). Specifically, the paper focuses on the following two main challenges: 1. **Impact on performance and feasibility**: Traditional control methods, such as the Quadratic Programming - Control Barrier Function (QP - CBF) method, can ensure safety when solving safety - critical problems, but are easily affected by the choice of cost function and parameter calibration, resulting in performance loss or infeasible solutions. Especially in terms of parameter selection, heuristic adjustment is usually required, which may lead to sub - optimal controller responses and even violate safety. 2. **Adaptive adjustment of control parameters**: In order to balance the trade - off between performance and conservatism, it is necessary to adaptively adjust the parameters in the Control Barrier Function (CBF). However, existing methods (such as Adaptive CBF, AdaCBF) are difficult to apply in practice because they need to define "penalty terms" and their corresponding dynamics, and do not provide guidance to optimize CBF constraints to optimize system responses while ensuring safety. To solve these problems, the paper proposes a Receding Horizon Control (RHC) method based on Reinforcement Learning (RL), which combines Model Predictive Control (MPC) and Control Barrier Function (CBF). The specific contributions are as follows: - **Parameterized MPC controller**: By introducing a parameterized MPC controller, the infeasibility problem encountered in the QP - CBF method is solved. At the same time, RL is used to learn the optimal parameters in the MPC objective function and CBF constraints, thereby balancing the trade - off between safety and performance. - **Computational efficiency**: The proposed method does not need to perform back - propagation through the MPC - CBF controller, so it is more computationally efficient. - **Multi - CAV control problem**: The paper also considers the longitudinal and lateral motion control problems of CAVs on merging lanes. By training the controller parameters of a single CAV and applying them to a group of homogeneous CAVs during deployment, the generalization ability of the learned controller is demonstrated. In summary, by combining RL and MPC - CBF, this paper provides an effective method to solve the control problems in safety - critical systems, especially in the application scenarios of CAVs, improving the performance and safety of the system.