Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

Renxing Li,Zhiwei Shang,Chunhua Zheng,Huiyun Li,Qing Liang,Yunduan Cui
DOI: https://doi.org/10.1007/s10489-023-04867-z
IF: 5.3
2023-07-29
Applied Intelligence
Abstract:In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback-Leibler divergence-regularized distributional RL (KL-C51) is proposed to integrate the advantages of both stability in the distributional RL and data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL in one framework. KL-C51 derived the Bellman equation and the TD errors regularized by KL divergence in a distributional perspective and explored the approximated strategies of properly mapping the corresponding Boltzmann softmax term into distributions. Evaluated not only by several benchmark tasks with different complexity from OpenAI Gym but also by six Atari 2600 games from the Arcade Learning Environment, the proposed method clearly illustrates the positive effect of the KL divergence regularization to the distributional RL including exclusive exploration behaviors and smooth value function update, and demonstrates an improvement in both learning stability and data-efficiency compared with other related baseline approaches.
computer science, artificial intelligence
What problem does this paper attempt to address?