Abstract:The Distributionally Robust Markov Decision Process (DRMDP) is a popular framework for addressing dynamics shift in reinforcement learning by learning policies robust to the worst-case transition dynamics within a constrained set. However, solving its dual optimization oracle poses significant challenges, limiting theoretical analysis and computational efficiency. The recently proposed Robust Regularized Markov Decision Process (RRMDP) replaces the uncertainty set constraint with a regularization term on the value function, offering improved scalability and theoretical insights. Yet, existing RRMDP methods rely on unstructured regularization, often leading to overly conservative policies by considering transitions that are unrealistic. To address these issues, we propose a novel framework, the $d$-rectangular linear robust regularized Markov decision process ($d$-RRMDP), which introduces a linear latent structure into both transition kernels and regularization. For the offline RL setting, where an agent learns robust policies from a pre-collected dataset in the nominal environment, we develop a family of algorithms, Robust Regularized Pessimistic Value Iteration (R2PVI), employing linear function approximation and $f$-divergence based regularization terms on transition kernels. We provide instance-dependent upper bounds on the suboptimality gap of R2PVI policies, showing these bounds depend on how well the dataset covers state-action spaces visited by the optimal robust policy under robustly admissible transitions. This term is further shown to be fundamental to $d$-RRMDPs via information-theoretic lower bounds. Finally, numerical experiments validate that R2PVI learns robust policies and is computationally more efficient than methods for constrained DRMDPs.

Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

Distributional Reinforcement Learning for Efficient Exploration

Distributional Reinforcement Learning with Regularized Wasserstein Loss

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

How Does Value Distribution in Distributional Reinforcement Learning Help Optimization?

Distributional Soft Actor Critic for Risk Sensitive Learning

Single-Trajectory Distributionally Robust Reinforcement Learning

The $f$-Divergence Reinforcement Learning Framework

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

Fully Parameterized Quantile Function for Distributional Reinforcement Learning.

Efficient Deep Reinforcement Learning Requires Regulating Overfitting

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization

Boosting Offline Reinforcement Learning via Data Rebalancing

How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization

Distributional Reinforcement Learning With Quantile Regression