Abstract:Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this paper, we present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space where distributional shift is measured, to a finite-dimensional parametric space. Moreover, by differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules: a convex quadratic program to guarantee safety followed by a projected gradient ascent to simultaneously find the worst-case uncertainty. This end-to-end differentiable framework with safety constraints, to the best of our knowledge, is the first tractable single-level solution to address distributional safety. We test our approach on first and second-order systems with varying complexities and compare our results with the uncertainty-agnostic policies, where our approach demonstrates a significant improvement on safety guarantees.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to ensure the safety of the reinforcement learning system in the presence of model uncertainty. Specifically, the paper focuses on how to design a reinforcement learning framework that can guarantee safety in the case of distributional shift. Here, distributional shift refers to the difference between the probability distribution of the system's operating environment and the probability distribution assumed during training. This difference may lead to poor performance of the trained policy in practical applications and even dangerous behaviors. The paper proposes a method based on distributionally robust optimization (DRO). It quantifies distributional shift by introducing the Wasserstein metric and incorporates it into the control barrier function (CBF) to ensure safety under model uncertainty. To improve the solvability of the method, the author first uses duality theory to transform the lower - level optimization problem from an infinite - dimensional probability space to a finite - dimensional parameter space. Further, through differentiable convex programming techniques, the bilevel optimization problem is simplified into a single - level optimization problem, thus improving computational efficiency. This method can not only guarantee safety constraints but also find the worst - case uncertainty at the same time, achieving double guarantees of safety and robustness. The main contribution of the paper is the proposal of an end - to - end differentiable single - level optimization framework that can efficiently implement safe reinforcement learning in the presence of distributional shift. Experimental results show that, compared with policies that do not consider uncertainty, this method has a significant improvement in safety.

Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning

Safe Distributional Reinforcement Learning

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming

Safe Model-Based Reinforcement Learning for Systems with Parametric Uncertainties

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

Lyapunov-based uncertainty-aware safe reinforcement learning

Safe Wasserstein Constrained Deep Q-Learning

Distributionally Robust Policy and Lyapunov-Certificate Learning

Value-Distributional Model-Based Reinforcement Learning

Online Optimization and Ambiguity-Based Learning of Distributionally Uncertain Dynamic Systems

Distributionally Robust Infinite-horizon Control: from a pool of samples to the design of dependable controllers

Distributionally Robust Safety Verification for Markov Decision Processes

Conservative Distributional Reinforcement Learning with Safety Constraints

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models

Verifiably Safe Off-Model Reinforcement Learning

Distributional Method for Risk Averse Reinforcement Learning