Diffskill: Improving Reinforcement Learning Through Diffusion-Based Skill Denoiser for Robotic Manipulation

Siao Liu,Yang Liu,Linqiang Hu,Ziqing Zhou,Yi Xie,Zhile Zhao,Wei Li,Zhongxue Gan
DOI: https://doi.org/10.1016/j.knosys.2024.112190
IF: 8.139
2024-01-01
Knowledge-Based Systems
Abstract:Although Reinforcement Learning (RL) has demonstrated impressive success in various applications, addressing complex robotic manipulation tasks remains a formidable challenge. Recently, Skill-based approaches that extract reusable skills from offline data and encode them into a latent space are proposed to leverage prior knowledge for accelerating robot learning. However, existing skill learning methods predominantly rely on regularization constraints or reversible mappings to guide skill prior generation, lacking explicit control over the trade-off between exploiting offline knowledge and exploring novel skill behaviors. In this paper, we point out that the challenge of skill exploration lies in the noise within skill embeddings and propose a novel denoising-based skill-based RL framework, DiffSkill. Specifically, our DiffSkill integrates a diffusion-based skill denoiser into the hierarchical architecture, effectively bridging the gap between offline knowledge and learned skill prior embeddings through iterative denoising. Nevertheless, incorporating diffusion models into the skill-based RL framework for robot control faces two main challenges: (i) Uncertain noisy levels of skill embeddings and (ii) Action oscillation during skill transitions. In this regard, we propose a cycle anneal scheduler for dynamic timestep adjustment and an online momentum smoothing strategy to effectively mitigate oscillations during skill transitions, resulting in more stable and superior performance. Extensive comparison experiments across six challenging robotic manipulation tasks demonstrate that DiffSkill consistently outperforms state-of-the-art methods by a significant margin in all downstream tasks. Ablation studies and additional discussions further validate the effectiveness of each component and strategy.
What problem does this paper attempt to address?