Abstract:We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. The goal of planning is to find a robust policy that optimizes the worst-case values against the transition uncertainties, and thus encompasses the standard MDP planning as a special case. For $(\mathbf{s},\mathbf{a})$-rectangular uncertainty sets, we develop a policy-based first-order method, namely the robust policy mirror descent (RPMD), and establish an $\mathcal{O}(\log(1/\epsilon))$ and $\mathcal{O}(1/\epsilon)$ iteration complexity for finding an $\epsilon$-optimal policy, with two increasing-stepsize schemes. The prior convergence of RPMD is applicable to any Bregman divergence, provided the policy space has bounded radius measured by the divergence when centering at the initial policy. Moreover, when the Bregman divergence corresponds to the squared euclidean distance, we establish an $\mathcal{O}(\max \{1/\epsilon, 1/(\eta \epsilon^2)\})$ complexity of RPMD with any constant stepsize $\eta$. For a general class of Bregman divergences, a similar complexity is also established for RPMD with constant stepsizes, provided the uncertainty set satisfies the relative strong convexity. We further develop a stochastic variant, named SRPMD, when the first-order information is only available through online interactions with the nominal environment. For general Bregman divergences, we establish an $\mathcal{O}(1/\epsilon^2)$ and $\mathcal{O}(1/\epsilon^3)$ sample complexity with two increasing-stepsize schemes. For the euclidean Bregman divergence, we establish an $\mathcal{O}(1/\epsilon^3)$ sample complexity with constant stepsizes. To the best of our knowledge, all the aforementioned results appear to be new for policy-based first-order methods applied to the robust MDP problem.

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

Multistage Robust Mixed-Integer Optimization under Endogenous Uncertainty

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

A Bayesian Risk Approach to Data-driven Stochastic Optimization: Formulations and Asymptotics

An Approximate Solution Method for Large Risk-Averse Markov Decision Processes

Robust Average-Reward Markov Decision Processes

Risk-Averse Bayes-Adaptive Reinforcement Learning

Constrained Risk-Averse Markov Decision Processes

Risk Aversion to Parameter Uncertainty in Markov Decision Processes with an Application to Slow-Onset Disaster Relief

First-order Policy Optimization for Robust Markov Decision Process

Risk-Averse Decision Making Under Uncertainty

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Distributionally robust optimization for sequential decision-making

Transition Constrained Bayesian Optimization via Markov Decision Processes

Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs

Risk-Averse Markov Decision Processes Through a Distributional Lens

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

A Statistical Perspective on Linear Programs with Uncertain Parameters

Risk probability optimization of finite horizon piecewise deterministic Markov decision processes

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach