Abstract:The sim-to-real gap, which represents the disparity between training and testing environments, poses a significant challenge in reinforcement learning (RL). A promising approach to addressing this challenge is distributionally robust RL, often framed as a robust Markov decision process (RMDP). In this framework, the objective is to find a robust policy that achieves good performance under the worst-case scenario among all environments within a pre-specified uncertainty set centered around the training environment. Unlike previous work, which relies on a generative model or a pre-collected offline dataset enjoying good coverage of the deployment environment, we tackle robust RL via interactive data collection, where the learner interacts with the training environment only and refines the policy through trial and error. In this robust RL paradigm, two main challenges emerge: managing distributional robustness while striking a balance between exploration and exploitation during data collection. Initially, we establish that sample-efficient learning without additional assumptions is unattainable owing to the curse of support shift; i.e., the potential disjointedness of the distributional supports between the training and testing environments. To circumvent such a hardness result, we introduce the vanishing minimal value assumption to RMDPs with a total-variation (TV) distance robust set, postulating that the minimal value of the optimal robust value function is zero. We prove that such an assumption effectively eliminates the support shift issue for RMDPs with a TV distance robust set, and present an algorithm with a provable sample complexity guarantee. Our work makes the initial step to uncovering the inherent difficulty of robust RL via interactive data collection and sufficient conditions for designing a sample-efficient algorithm accompanied by sharp sample complexity analysis.

Improving Robustness via Risk Averse Distributional Reinforcement Learning

Robust Reinforcement Learning with Distributional Risk-averse formulation

Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

Robust Reinforcement Learning with Dynamic Distortion Risk Measures

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

Distributional Soft Actor Critic for Risk Sensitive Learning

Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

On Practical Robust Reinforcement Learning: Adjacent Uncertainty Set and Double-Agent Algorithm.

Robust Quadrupedal Locomotion Via Risk-Averse Policy Learning

On the Foundation of Distributionally Robust Reinforcement Learning

Safe Distributional Reinforcement Learning

Robust Situational Reinforcement Learning in Face of Context Disturbances.

Single-Trajectory Distributionally Robust Reinforcement Learning

Robust Reinforcement Learning: A Review of Foundations and Recent Advances

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence