Boltzmann Exploration for Deterministic Policy Optimization.

Shaochen Wang,Yuan Pu,Shangtong Yang,Xin Yao,Bin Li
DOI: https://doi.org/10.1007/978-3-030-63833-7_18
2020-01-01
Abstract:Gradient-based reinforcement learning has gained more and more attention. As one of the most important methods, Deep Deterministic Policy Gradient (DDPG) has achieved remarkable success and has been applied to many challenging continuous scenarios. However, it still suffers from instable training on off-policy data and premature convergence to a local optimum. To deal with these problems, in this paper, we combine Boltzmann exploration with deterministic policy gradient. The candidate policy is represented by a Boltzmann distribution, and updated by Kullback-Leibler (KL) projection. By introducing the Boltzmann policy, the exploration is encouraged to effectively prevent the policy to collapse quickly. Experimental results show that the proposed algorithm outperforms DDPG on most tasks in MuJoCo continuous benchmark.
What problem does this paper attempt to address?