Self-imitation guided goal-conditioned reinforcement learning

Yao Li,YuHui Wang,XiaoYang Tan
DOI: https://doi.org/10.1016/j.patcog.2023.109845
IF: 8
2023-08-04
Pattern Recognition
Abstract:Goal-conditioned reinforcement learning (GCRL) aims to control agents to reach desired goals, which poses a significant challenge due to task-specific variations in configurations. However, current GCRL methods suffer from limitations in sample efficiency and the need for substantial training data. While existing self-imitation-based GCRL approaches can improve sample efficiency, their scalability to large-scale tasks is limited. In this paper, we propose integrating self-imitation learning with goal-conditioned RL methods into a compatible and reasonable framework. Specifically, we introduce a novel target action value function to aggregate self-imitation learning and goal-conditioned reinforcement learning. The designed target value effectively combines these two policy training mechanisms to accomplish specific tasks. Moreover, we theoretically demonstrate that our approach can learn a superior policy compared to both self-imitation learning and goal-conditioned reinforcement learning. Additionally, experimental results showcase the stability and effectiveness of our method compared to existing approaches in various challenging robotic control tasks.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?