Safety-Aware Preference-Based Learning for Safety-Critical Control

Ryan K. Cosner,Maegan Tucker,Andrew J. Taylor,Kejun Li,Tamás G. Molnár,Wyatt Ubellacker,Anil Alan,Gábor Orosz,Yisong Yue,Aaron D. Ames
DOI: https://doi.org/10.48550/arXiv.2112.08516
2022-04-12
Abstract:Bringing dynamic robots into the wild requires a tenuous balance between performance and safety. Yet controllers designed to provide robust safety guarantees often result in conservative behavior, and tuning these controllers to find the ideal trade-off between performance and safety typically requires domain expertise or a carefully constructed reward function. This work presents a design paradigm for systematically achieving behaviors that balance performance and robust safety by integrating safety-aware Preference-Based Learning (PBL) with Control Barrier Functions (CBFs). Fusing these concepts -- safety-aware learning and safety-critical control -- gives a robust means to achieve safe behaviors on complex robotic systems in practice. We demonstrate the capability of this design paradigm to achieve safe and performant perception-based autonomous operation of a quadrupedal robot both in simulation and experimentally on hardware.
Robotics,Systems and Control
What problem does this paper attempt to address?
This paper attempts to solve the key problem of how to balance performance and safety in dynamic robot control. Specifically, the paper focuses on designing a method that can systematically achieve high - performance and high - safety behaviors, especially in complex robot systems. Although traditional controllers can provide strong safety guarantees, they often lead to conservative behaviors, and adjusting these controllers to find the ideal balance between performance and safety usually requires domain experts or carefully designed reward functions. Therefore, this paper proposes a new design paradigm to achieve this goal by combining preference - based learning (PBL) with control barrier functions (CBFs). ### Main Contributions 1. **Proposed Safety - Aware LineCoSpar (SA - LineCoSpar)**: This is an improved version of the LineCoSpar algorithm that can perform preference - based Bayesian optimization in high - dimensional parameter spaces while considering safety. 2. **Combined Measured Robust CBFs (MR - CBFs) and Input - State - Safety CBFs (ISSf - CBFs)**: These two methods deal with measurement uncertainty and perturbations respectively, and achieve provable safety guarantees through multi - layer safety - critical control with reduced order. 3. **Verified on a quadruped robot**: This method has been tested not only in a simulation environment but also in actual hardware experiments in laboratory and outdoor environments, demonstrating its ability to operate a quadruped robot autonomously based on perception. ### Method Overview - **Preference - Based Learning (PBL)**: Adjust design parameters through users' subjective feedback (such as pairwise preferences and ordinal labels), thus avoiding the difficult problem of explicitly defining reward functions. - **Control Barrier Functions (CBFs)**: Used to ensure the safety of the system, especially in the presence of measurement uncertainty and perturbations. - **Safety - Aware LineCoSpar (SA - LineCoSpar)**: By combining PBL and CBFs, ensure that unsafe behaviors are avoided during the exploration process while maintaining the performance of the system. ### Experimental Results - **Simulation and Actual Hardware Experiments**: Obstacle avoidance tasks in indoor and outdoor environments were carried out on the quadruped robot Unitree A1. The experimental results show that the TR - OP parameters adjusted by SA - LineCoSpar can effectively navigate between obstacles while ensuring the safety of the system. ### Formulas - **Definition of CBFs**: \[ \sup_{v \in \mathbb{R}^m} \left( L_f h(x, \rho) + L_g h(x, \rho) v \right) > -\alpha(h(x, \rho)) \] where \( L_f h(x, \rho) \) and \( L_g h(x, \rho) \) represent the Lie derivatives of \( h \) with respect to \( f \) and \( g \) respectively. - **Definition of ISSf - CBFs**: \[ \sup_{v \in \mathbb{R}^m} \left( L_f h(x, \rho) + L_g h(x, \rho) v - \phi \| L_g h(x, \rho) \|^2 \right) > -\alpha(h(x, \rho)) \] - **Definition of MR - CBFs**: \[ \sup_{v \in \mathbb{R}^m} \left( L_f h(x, \hat{\rho}) + L_g h(x, \hat{\rho}) v - a - b \| v \| \right) > -\alpha(h(x, \hat{\rho})) \] - **Safety Filter of TR - OP**: \[ k(x) = \arg\min_{v \in \mathbb{R}^m} \| v - k_{\text{nom}}(x) \|^2 \] Constraints: