Toward Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

Wenhao Yang,Liangyu Zhang,Zhihua Zhang
DOI: https://doi.org/10.1214/22-aos2225
2022-01-01
Abstract:In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and (s, a)-rectangular assumption, we improve their results and also consider other uncertainty sets, including L1 and χ 2 balls. Our results show that when we assume (s, a)-rectangular on uncertainty sets, the sample complexity is about Õ ( |S|2|A| ε2ρ2(1−γ)4 ) . In addition, we extend our results from (s, a)rectangular assumption to s-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under (s, a)-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate √ n under (s, a) and s-rectangular assumptions from both theoretical and empirical perspectives. ∗Academy for Advanced Interdisciplinary Studies, Peking University; email: yangwenhaosms@pku.edu.cn. †Academy for Advanced Interdisciplinary Studies, Peking University; email: zhangliangyu@pku.edu.cn. ‡School of Mathematical Sciences, Peking University; email: zhzhang@math.pku.edu.cn. 1 ar X iv :2 10 5. 03 86 3v 2 [ st at .M L ] 9 O ct 2 02 1
What problem does this paper attempt to address?