Taking Complementary Advantages: Improving Exploration Via Double Self-Imitation Learning in Procedurally-Generated Environments

Hao Lin,Yue He,Fanzhang Li,Quan Liu,Bangjun Wang,Fei Zhu
DOI: https://doi.org/10.1016/j.eswa.2023.122145
IF: 8.5
2024-01-01
Expert Systems with Applications
Abstract:Efficient exploration is the core issue of deep reinforcement learning. Although state-of-the-art exploration methods have achieved much progress in many tasks, they usually underperform in procedurally-generated environments, indicating the low capability of generalization of the agent. To address the problem, a self-imitation exploration approach for procedurally-generated environments, referred to as Double Self-Imitation Learning (DSIL), is proposed. DSIL screens out good history experiences of exploration by utilizing an episode scoring rule that considers local scores, global scores and external rewards. Then DSIL employs a cooperation strategy to reproduce the agent’s past good exploration behaviors by combining generative adversarial imitation learning (GAIL) and behavioral cloning (BC). Specifically, DSIL is composed of a reinforcement learning module and a discriminator. The discriminator generates intrinsic rewards by judging the similarity of the current state–action pairs to the past good exploration experiences. The policy of agent is optimized alternately by the BC task and the reinforcement learning algorithm in the GAIL task; meanwhile, the reinforcement learning module and the discriminator are updated alternately in the GAIL task. Experiments on several procedurally-generated environments demonstrated that the proposed DSIL significantly outperformed existing exploration approaches in sample efficiency and performance, that is, DSIL makes the agent have stronger generalization.
What problem does this paper attempt to address?