Abstract:Offline reinforcement learning aims to learn from pre-collected datasets without active exploration. This problem faces significant challenges, including limited data availability and distributional shifts. Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively. In this paper, we show that the distributionally robust optimization (DRO) based approach can also address these challenges and is {asymptotically minimax optimal}. Specifically, we directly model the uncertainty in the transition kernel and construct an uncertainty set of statistically plausible transition kernels. We then show that the policy that optimizes the worst-case performance over this uncertainty set has a near-optimal performance in the underlying problem. We first design a metric-based distribution-based uncertainty set such that with high probability the true transition kernel is in this set. We prove that to achieve a sub-optimality gap of $\epsilon$, the sample complexity is $\mathcal{O}(S^2C^{\pi^*}\epsilon^{-2}(1-\gamma)^{-4})$, where $\gamma$ is the discount factor, $S$ is the number of states, and $C^{\pi^*}$ is the single-policy clipped concentrability coefficient which quantifies the distribution shift. To achieve the optimal sample complexity, we further propose a less conservative value-function-based uncertainty set, which, however, does not necessarily include the true transition kernel. We show that an improved sample complexity of $\mathcal{O}(SC^{\pi^*}\epsilon^{-2}(1-\gamma)^{-3})$ can be obtained, which asymptotically matches with the minimax lower bound for offline reinforcement learning, and thus is asymptotically minimax optimal.

DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning.

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

DROP: Conservative Model-based Optimization for Offline Reinforcement Learning

Uncertainty-aware Distributional Offline Reinforcement Learning

Uncertainty-Aware Data Augmentation for Offline Reinforcement Learning

Discriminant Distance-Aware Representation on Deterministic Uncertainty Quantification Methods

UAC: Offline Reinforcement Learning with Uncertain Action Constraint

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning

Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning

Selective Uncertainty Propagation in Offline RL

Achieving the Asymptotically Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

DCE: Offline Reinforcement Learning with Double Conservative Estimates

SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching