Explorer-Actor-Critic: Better Actors for Deep Reinforcement Learning
Junwei Zhang,Shuai Han,Xi Xiong,Sheng Zhu,Shuai Lü
DOI: https://doi.org/10.1016/j.ins.2024.120255
IF: 8.1
2024-02-02
Information Sciences
Abstract:Actor-critic deep reinforcement learning methods have demonstrated significant performance in many challenging decision-making and control tasks, but also suffer from high sample complexity and overestimation bias. Current researches focus on using underestimation to balance overestimation and reducing bias through ensemble learning, but introducing underestimation bias and excessive network costs. In this paper, we first analyze the effect of action selection policy on estimation bias. Then, we propose the Explorer-Actor-Critic (EAC) method that gives a more conservative objective for the actor to reduce overestimation, introduces a learnable explorer to improve exploration ability, and uses an action mixing mechanism to mitigate experience distribution bias. Furthermore, we apply the EAC method to TD3 and SAC and verify its effectiveness through extensive comparison and ablation experiments. Our algorithm not only outperforms state-of-the-art algorithms, but also is compatible with other actor-critic methods.
computer science, information systems