Experience Consistency Distillation Continual Reinforcement Learning for Robotic Manipulation Tasks

Zhao Chao,Xu Jie,Peng Ru,Chen Xingyu,Mei Kuizhi,Lan Xuguang
DOI: https://doi.org/10.1109/icra57147.2024.10611494
2024-01-01
Abstract:Continual reinforcement learning, which aims to help robots acquire skills without catastrophic forgetting, obviating the need to re-learn all tasks from scratch. In order to enable lifelong acquisition of skills in robots, replay-based continual reinforcement learning has emerged as a promising research direction. These techniques replay data from previous tasks to mitigate forgetting when learning new skills. However, existing replay-based methods store poor representative experience, and the experience utilization of old tasks is inefficient. To address these issues, we propose a experience consistency distillation method for robot continual reinforcement learning to improve the data efficiency of the experience. Specifically, the experience of old tasks are distilled to obtain Markov Decision Process (MDP) data with high compression ratio and information content. To ensure consistent data distributions before and after distillation, we further utilize a Fréchet Inception Distance (FID) loss as a regularization constraint. In order to improve experience utilization efficiency, the policy is then trained using both the distilled data and current task data, with policy distillation performed based on uncertainty metrics. Our method is validated in the continual reinforcement learning simulation platform and real scene with a UR5e robot arm. Experimental results indicate that our method achieves higher success and lower buffer size requirement compared to other methods.
What problem does this paper attempt to address?