Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming

Alaa Eddine Chriat,Chuangchuang Sun
2023-10-04
Abstract:Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this paper, we present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space where distributional shift is measured, to a finite-dimensional parametric space. Moreover, by differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules: a convex quadratic program to guarantee safety followed by a projected gradient ascent to simultaneously find the worst-case uncertainty. This end-to-end differentiable framework with safety constraints, to the best of our knowledge, is the first tractable single-level solution to address distributional safety. We test our approach on first and second-order systems with varying complexities and compare our results with the uncertainty-agnostic policies, where our approach demonstrates a significant improvement on safety guarantees.
Machine Learning,Robotics,Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to ensure the safety of the reinforcement learning system in the presence of model uncertainty. Specifically, the paper focuses on how to design a reinforcement learning framework that can guarantee safety in the case of distributional shift. Here, distributional shift refers to the difference between the probability distribution of the system's operating environment and the probability distribution assumed during training. This difference may lead to poor performance of the trained policy in practical applications and even dangerous behaviors. The paper proposes a method based on distributionally robust optimization (DRO). It quantifies distributional shift by introducing the Wasserstein metric and incorporates it into the control barrier function (CBF) to ensure safety under model uncertainty. To improve the solvability of the method, the author first uses duality theory to transform the lower - level optimization problem from an infinite - dimensional probability space to a finite - dimensional parameter space. Further, through differentiable convex programming techniques, the bilevel optimization problem is simplified into a single - level optimization problem, thus improving computational efficiency. This method can not only guarantee safety constraints but also find the worst - case uncertainty at the same time, achieving double guarantees of safety and robustness. The main contribution of the paper is the proposal of an end - to - end differentiable single - level optimization framework that can efficiently implement safe reinforcement learning in the presence of distributional shift. Experimental results show that, compared with policies that do not consider uncertainty, this method has a significant improvement in safety.