Dynamic Policy Programming with Descending Regularization for Efficient Reinforcement Learning Control

Renxing Li,Zhiwei Shang,Chunhua Zheng,Huiyun Li,Qing Liang,Yunduan Cui
DOI: https://doi.org/10.1109/prai55851.2022.9904283
2022-01-01
Abstract:In this work, a novel value function-based reinforcement learning (RL) approach, descending dynamic policy programming (DDPP) is proposed to address the issues of sample-efficiency and learning stability in control problems. Extended from the state-of-the-art Kullback-Leibler divergence regularized RL method dynamic policy programming (DPP) that punishes the over-large policy update during learning, DDPP employs a descending strategy of the parameters to dynamically control the penalty term. Evaluated by several benchmark control tasks in OpenAI gym, the proposed method successfully demonstrates its superiorities in both learning stability and sample-efficiency compared with the related baseline approaches and therefore indicates a great potential of the descending strategy of Kullback-Leibler divergence regularization towards more practical implementations using RL.
What problem does this paper attempt to address?