Adaptive Cooperative Exploration for Reinforcement Learning from Imperfect Demonstrations

Fuxian Huang,Naye Ji,Huajian Ni,Shijian Li,Xi Li
DOI: https://doi.org/10.1016/j.patrec.2022.12.003
IF: 4.757
2023-01-01
Pattern Recognition Letters
Abstract:In reinforcement learning, exploration is an important way to learn new skills, but it is usually inefficient when faced with huge state-action space or sparse extrinsic rewards. Generally, expert demonstrations can assist the policy learning by leading the agent to imitate or explore these data. However, the demonstrations are often imperfect due to the data collection noise or immature expert. To this end, we propose a novel adaptive cooperative exploration method that can effectively alleviate the issues of imperfect demonstrations and improve the policy learning with them. Specifically, we propose a cooperative learning module to encourage two agents to explore diversely with it and then fuse the learned policies. Meanwhile, the adaptive self-supervised exploration method is presented to dynamically explore the demonstrations considering the environmental feedback. Therefore, the proposed method can achieve effective utilization of the imperfect demonstrations for policy learning. Experimental results demonstrate the effectiveness of the proposed method on MuJoCo benchmark. (c) 2022 Published by Elsevier B.V.
What problem does this paper attempt to address?