Abstract:In this paper we investigate the tractability of robust Markov Decision Processes (RMDPs) under various structural assumptions on the uncertainty set. Surprisingly, we show that in all generality (i.e. without any assumption on the instantaneous rewards), s-rectangular and sa-rectangular uncertainty sets are the only models of uncertainty that are tractable. Our analysis also shows that existing non-rectangular models, including r-rectangular uncertainty and new generalizations, are only weakly tractable in that they require an additional structural assumption that the instantaneous rewards do not depend on the next state, and in this case they are equivalent to rectangular models, which severely undermines their significance and usefulness. Interestingly, our proof techniques rely on identifying a novel simultaneous solvability property, which we show is at the heart of several important properties of RMDPs, including the existence of stationary optimal policies and dynamic programming-based formulations. The simultaneous solvability property enables a unified approach to studying the tractability of all existing models of uncertainty, rectangular and non-rectangular alike.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: the solvability of Robust Markov Decision Processes (RMDPs) under different uncertainty set assumptions. Specifically, the author aims to determine which uncertainty models can make the robust MDP computationally tractable and provide a unified method to study the solvability of all existing uncertainty models. ### Main research questions of the paper 1. **What kind of uncertainty models can lead to solvable robust MDPs?** 2. **Does there exist a unified method to study the solvability of all existing uncertainty models?** ### Background and motivation In the standard Markov decision process, if the model parameters are known, the optimal policy can be efficiently found by methods such as value iteration, policy iteration, linear programming or gradient descent. However, when the model parameter estimates are inaccurate, it may lead to a serious performance degradation. Robust MDP alleviates this problem by considering optimization in the pessimistic case, that is, maximizing the worst - case return. However, for general uncertainty sets, even the policy evaluation problem may be computationally very difficult, such as NP - hard problems. Therefore, there are a lot of studies in the literature dedicated to finding sufficient conditions to make the robust MDP solvable. ### Main contributions 1. **The rectangular model is the only solvable model**: The author proves that in general, only the s - rectangular and sa - rectangular uncertainty models are solvable. They prove this through the relationship between dynamic programming and the Simultaneous Solvability Property (SSP). 2. **Weakly solvable robust MDP**: The author further studies the weak solvability in some special cases (for example, when the reward does not depend on the next state) and proves that the weak solvability is equivalent to a weaker simultaneous solvability property. 3. **New perspective**: The author's results answer several important open questions in the robust MDP literature and reveal some important aspects that were previously poorly understood. For example, they emphasize the crucial role of the rectangular model in robust MDP and prove that other non - rectangular models are actually equivalent to the rectangular model. ### Conclusion This paper provides an important theoretical basis for the study of robust MDP, especially regarding the solvability of uncertainty models. The author not only proposes necessary conditions but also provides a unified framework to evaluate the solvability of different uncertainty models, which is of great significance for future research. ### Formula summary - **Dynamic programming equation**: \[ u^\pi_s=\min_{P\in P}\sum_{a\in A}\pi(s,a)P^\top_s(r_s^a + \gamma u^\pi),\quad\forall s\in S \] - **Simultaneous Solvability Property (SSP)**: \[ \bigcap_{s\in S}\arg\min_{P\in P}\langle P_s,V_s\rangle\neq\emptyset \] These formulas show how to analyze the solvability of robust MDP through dynamic programming and the simultaneous solvability property.

Tractable Robust Markov Decision Processes

Sequential Decision-Making under Uncertainty: A Robust MDPs review

Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet

Multistage Robust Mixed-Integer Optimization under Endogenous Uncertainty

A Family of [math]-Rectangular Robust MDPs: Relative Conservativeness, Asymptotic Analyses, and Finite-Sample Properties

Toward Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

Robust Average-Reward Markov Decision Processes

Roping in Uncertainty: Robustness and Regularization in Markov Games

Robust Anytime Learning of Markov Decision Processes

Rectangularity and duality of distributionally robust Markov Decision Processes

Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

Robustness to Modeling Errors in Risk-Sensitive Markov Decision Problems with Markov Risk Measures

Robust Markov Decision Processes without Model Estimation

Robust Active Measuring under Model Uncertainty

Model-Free Robust Average-Reward Reinforcement Learning

Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

Efficient Duple Perturbation Robustness in Low-rank MDPs

Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

Non-asymptotic Performances of Robust Markov Decision Processes

Time-Constrained Robust MDPs

Towards Minimax Optimality of Model-based Robust Reinforcement Learning