Abstract:Many real-world domains require safe decision making in uncertain environments. In this work, we introduce a deep reinforcement learning framework for approaching this important problem. We consider a distribution over transition models, and apply a risk-averse perspective towards model uncertainty through the use of coherent distortion risk measures. We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems. Unlike existing approaches to robustness in deep reinforcement learning, however, our formulation does not involve minimax optimization. This leads to an efficient, model-free implementation of our approach that only requires standard data collection from a single training environment. In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.

What problem does this paper attempt to address?

This paper attempts to solve the problem of safe decision - making in uncertain environments, especially in the case of model uncertainty, how to ensure that Reinforcement Learning (RL) algorithms can provide robust performance and safety during deployment. Specifically, the paper introduces a new deep reinforcement learning framework. By applying the coherent distortion risk measure to handle model uncertainty, it realizes a risk - averse perspective on model uncertainty. This method not only provides theoretical robustness guarantees but also avoids the complex min - max optimization problems common in existing robust reinforcement learning methods, enabling the algorithm to efficiently implement model - free implementation in a single training environment. The paper verifies the effectiveness of this framework through experiments in continuous - control tasks, demonstrating its robust performance and safety in different perturbed test environments. ### Core Contributions of the Paper 1. **Introducing a Risk - Averse Perspective**: The paper redefines the safe reinforcement learning problem. By using the coherent distortion risk measure to handle model uncertainty, it proposes the corresponding Bellman operator. 2. **Theoretical Robustness Guarantee**: It is theoretically proven that the proposed framework is equivalent to a specific class of distribution - robust safe reinforcement learning problems, providing robustness guarantees. 3. **Efficient Deep RL Implementation**: An efficient deep reinforcement learning implementation method is proposed, which avoids the difficult min - max optimization problems in robust reinforcement learning and only needs to collect data from a single training environment. 4. **Experimental Verification**: Through experiments on continuous - control tasks with safety constraints, the robust performance and safety of this framework during deployment are demonstrated. ### Key Technologies - **Coherent Distortion Risk Measure**: It is used to quantify and handle model uncertainty to ensure that safe decisions can still be made in uncertain environments. - **Distribution - Robust Optimization**: By considering the distribution of model uncertainty, the problem is transformed into a distribution - robust optimization problem, thereby providing robustness guarantees. - **Sample - Based Risk Measure Estimation**: The method of sample - weighted average is used to efficiently estimate the risk measure, enabling the algorithm to operate efficiently in practice. ### Experimental Results The paper conducted experiments on five continuous - control tasks in the Real - World RL Suite, including Cartpole Swingup, Walker Walk, Walker Run, Quadruped Walk, and Quadruped Run. The experimental results show that the proposed RAMU framework can maintain good performance and safety in various perturbed test environments, verifying its robustness. In conclusion, by introducing a new method for handling model uncertainty, this paper significantly improves the robustness and safety of reinforcement learning in uncertain environments, providing strong support for safe decision - making in practical applications.

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning

Robust Reinforcement Learning with Dynamic Distortion Risk Measures

Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming

Robust Reinforcement Learning with Distributional Risk-averse formulation

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Improving Robustness via Risk Averse Distributional Reinforcement Learning

Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning

Safe Distributional Reinforcement Learning

Robust Safe Reinforcement Learning under Adversarial Disturbances

Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

Lyapunov-based uncertainty-aware safe reinforcement learning

Enabling risk-aware Reinforcement Learning for medical interventions through uncertainty decomposition

Distributional Method for Risk Averse Reinforcement Learning

On the Foundation of Distributionally Robust Reinforcement Learning

Safe Reinforcement Learning with Dual Robustness