Exploring Policy Diversity in Parallel Actor-Critic Learning.

Yanqiang Zhang,Yuanzhao Zhai,Gongqian Zhou,Bo Ding,Dawei Feng,Songwang Liu
DOI: https://doi.org/10.1109/ictai56018.2022.00182
2022-01-01
Abstract:Exploration is a critical challenge for deep reinforcement learning methods. Although existing works such as actor-critic algorithms have made much progress, most still suffer from the sample inefficiency problem in complex environments where rewards are sparse. Parallel sampling, which uses multiple actors with the same policy interacting with the environment, is an effective approach to improve sample efficiency. However, parallel parameter-sharing actors collect similar samples, which generally hinders the improvement of the overall exploration process. In this paper, we propose a Policy Diversity enhanced approach for parallel Actor-Critic (PDAC). Specifically, we extend the parallel actor-critic architecture to the PDAC framework composed of a shared critic and parallel distinct actors. Then we introduce the KL-divergence of the action probability distribution between parallel actors as the intrinsic reward to encourage actors to explore diverse strategies. We evaluate our approach in multiple challenging procedurally-generated tasks and compare it with state-of-the-art algorithms. Experiments show that PDAC makes significant progress in the comparison, in terms of cumulative rewards and sample efficiency.
What problem does this paper attempt to address?