Double Replay Buffers with Restricted Gradient.

Linjing Zhang,Zongzhang Zhang
DOI: https://doi.org/10.1007/978-3-030-63833-7_25
2020-01-01
Abstract:In this paper we consider the problem of how to balance exploration and exploitation in deep reinforcement learning (DRL). We propose a generative method called double replay buffers with restricted gradient (DRBRG). DRBRG divides the replay buffer in experience replay into two parts: the exploration buffer and the exploitation buffer. The two replay buffers with different retention policies can increase sample diversity to prevent over-fitting caused by exploiting. In order to avoid the deviation of the current policy from the past behaviors by exploring, we introduce a gradient penalty to limit the policy change into a trust region. We compare our method with other methods using experience replay on continuous-action environments. Empirical results show that our method outperforms existing methods both in training performance and generalization performance.
What problem does this paper attempt to address?