Hierarchical Reinforcement Learning from Imperfect Demonstrations Through Reachable Coverage-Based Subgoal Filtering

Yu Tang,Shangqi Guo,Jinhui Liu,Bo Wan,Lingling An,Jian K. Liu
DOI: https://doi.org/10.1016/j.knosys.2024.111736
IF: 8.139
2024-01-01
Knowledge-Based Systems
Abstract:Reinforcement learning (RL) has shown remarkable success in navigating complex robotic and gaming landscapes. However, achieving such results often requires a substantial number of interaction episodes between the agent and its environment, especially in scenarios with sparse and long-term rewards. Although expert demonstrations and hierarchical structures can enhance sample efficiency of RL, the inclusion of noise in expert demonstrations may lead to performance degradation. Here we address this challenge by introducing a novel measurement, the noise elimination factor with reachable coverage, to quantify the noise in trajectory demonstrations. We propose a filtering method based on this measure, which effectively eliminates noise that deviates from the main demonstration clusters and mitigates the adverse impact of imperfect demonstrations, particularly in hierarchical reinforcement learning. To optimize the utilization of filtered demonstrations, we further eliminate similar and redundant instances, constructing a concise and semantically clear demonstration set for subgoal graph construction. This culminates in the development of a Reachable Coverage-based Hierarchical Reinforcement Learning method (RCHRL). Experimental validation in complex robot control tasks and Maze environments demonstrates the efficacy of our approach in removing demonstration noises, surpassing recent state-of-the-art demonstration-guided reinforcement learning methods in terms of both asymptotic performance and stability. Our code is available on https://github.com/YuTang06/RCHRL.
What problem does this paper attempt to address?