Finding Near Optimal Policies via Reducive Regularization in Markov Decision Processes

Wenhao Yang,Xiang Li,Guangzeng Xie,Zhihua Zhang
2021-01-01
Abstract:Regularized Markov Decision processes (MDPs) serve as a smooth version of ordinary MDPs to encourage exploration. Given a regularized MDP, however, the optimal policy is often biased when evaluating the value function. Rather than making the coefficient λ of regularized term sufficiently small, we propose a scheme by reducing λ to approximate the optimal policy of the original MDP. We prove that the iteration complexity to obtain an ε-optimal policy could be maintained or even reduced in comparison with setting a sufficiently small λ in both dynamic programming and policy gradient methods. In addition, there exists a strong duality connection between the reduction method and solving the original MDP directly, from which we can derive more adaptive reduction methods for certain reinforcement learning algorithms.
What problem does this paper attempt to address?