Non-asymptotic Performances of Robust Markov Decision Processes

Wenhao Yang,Zhihua Zhang
2021-01-01
Abstract:In this paper, we study the non-asymptotic performance of optimal policy on robust value function with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics. In particular, we consider three different uncertainty sets including the L1, χ 2 and KL balls in both (s, a)-rectangular and s-rectangular assumptions. Our results show that when we assume (s, a)-rectangular on uncertainty sets, the sample complexity is about Õ ( |S||A| ερ(1−γ) ) in the generative model setting and Õ ( |S| νminε ρ(1−γ) ) in the offline dataset setting. While prior works on non-asymptotic performances are restricted with the KL ball and (s, a)-rectangular assumption, we also extend our results to a more general s-rectangular assumption, which leads to a larger sample complexity than the (s, a)-rectangular assumption.
What problem does this paper attempt to address?