Abstract:Robustness and safety are critical for the trustworthy deployment of deep reinforcement learning. Real-world decision making applications require algorithms that can guarantee robust performance and safety in the presence of general environment disturbances, while making limited assumptions on the data collection process during training. In order to accomplish this goal, we introduce a safe reinforcement learning framework that incorporates robustness through the use of an optimal transport cost uncertainty set. We provide an efficient implementation based on applying Optimal Transport Perturbations to construct worst-case virtual state transitions, which does not impact data collection during training and does not require detailed simulator access. In experiments on continuous control tasks with safety constraints, our approach demonstrates robust performance while significantly improving safety at deployment time compared to standard safe reinforcement learning.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve robustness and safety in Deep Reinforcement Learning (DRL). Specifically, the authors propose a new framework, aiming to enhance the algorithm's robustness against environmental perturbations by using Optimal Transport Cost (OTC) uncertainty sets, while ensuring safety performance during deployment. The goal of this framework is to provide robust performance and safety guarantees for general - form environmental uncertainties without assuming detailed information in the data collection process and without the need to access a detailed simulator. ### Core Problems of the Paper 1. **Robustness and Safety**: How to ensure the robustness and safety of deep reinforcement learning algorithms in the presence of environmental perturbations? 2. **General Uncertainty Handling**: How to design methods that can handle various forms of environmental uncertainties when lacking prior knowledge about the structure of potential perturbations? 3. **Data Collection and Training Process**: How to achieve robustness and safety in the standard data collection process without affecting data collection in the actual environment or requiring a complex simulator? ### Solutions The authors propose a robust reinforcement learning framework based on optimal transport cost, with the following main contributions: 1. **Robustness Framework**: A safe reinforcement learning framework is proposed, incorporating robustness through the uncertainty set defined by the optimal transport cost. 2. **Recasting of the Worst - Case Optimization Problem**: It is proved that through Optimal Transport Perturbations (OTP), the virtual state transitions in the worst - case can be directly constructed in the state space, thereby transforming the worst - case optimization problem into a more tractable form. 3. **Efficient Implementation**: An efficient deep reinforcement learning implementation method is proposed. By applying optimal transport perturbations to construct virtual state transitions in the worst - case, it does not affect data collection during the training process. 4. **Experimental Verification**: Through experiments on continuous control tasks, a significant improvement in robust performance and safety of this method is demonstrated. ### Formulas and Concepts - **Optimal Transport Cost**: \[ \text{OTC}_{d_{s,a}}(\hat{p}_{s,a}, p_{s,a})=\inf_{\nu \in \Gamma(\hat{p}_{s,a}, p_{s,a})} \int_{S \times S} d_{s,a}(\hat{s}', s') \, d\nu(\hat{s}', s') \] where $\Gamma(\hat{p}_{s,a}, p_{s,a})$ is the set of all couplings of $\hat{p}_{s,a}$ and $p_{s,a}$. - **Robust Bellman Operator**: \[ T^\pi_{P,r} Q_r(s,a)=r(s,a)+\gamma \inf_{p_{s,a} \in P_{s,a}} \mathbb{E}_{s' \sim p_{s,a}} [V^\pi_r(s')] \] \[ T^\pi_{P,c} Q_c(s,a)=c(s,a)+\gamma \sup_{p_{s,a} \in P_{s,a}} \mathbb{E}_{s' \sim p_{s,a}} [V^\pi_c(s')] \] - **Optimal Transport Perturbation**: \[ g^r_{s,a} \in \arg \min_{g \in G} \mathbb{E}_{\hat{s}' \sim \hat{p}_{s,a}} [V^\pi_r(g(\hat{s}'))] \quad \text{s.t.} \quad \mathbb{E}_{\hat{s}' \sim \hat{p}_{s,a}} [d_{s,a}(\hat{s}', g(\hat{s}'))]

Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees

Optimal Transport-Assisted Risk-Sensitive Q-Learning

Robust Safe Reinforcement Learning under Adversarial Disturbances

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning

Risk-Aware Reinforcement Learning through Optimal Transport Theory

Train Trajectory Optimization with High-Risk State Space Boundaries: A Safe Reinforcement Learning Approach

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

ROSCOM: Robust Safe Reinforcement Learning on Stochastic Constraint Manifolds

Certifiable Robustness to Adversarial State Uncertainty in Deep Reinforcement Learning

Safe Reinforcement Learning Using Robust Control Barrier Functions

Safe Reinforcement Learning with Dual Robustness

Robust Reinforcement Learning for Continuous Control with Model Misspecification

Lyapunov-based uncertainty-aware safe reinforcement learning

Trustworthy autonomous driving via defense-aware robust reinforcement learning against worst-case observational perturbations

Robust Reinforcement Learning with UUB Guarantee for Safe Motion Control of Autonomous Robots

Robust Reinforcement Learning with Wasserstein Constraint

Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Rethinking Optimal Transport in Offline Reinforcement Learning