HRL2E: Hierarchical Reinforcement Learning with Low-level Ensemble

You Qin,Zhi Wang,Chunlin Chen
DOI: https://doi.org/10.1109/ijcnn55064.2022.9892189
2022-01-01
Abstract:Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach to solve challenging tasks with sparse rewards and long horizons. However, it suffers from the non-stationary problem due to the updating and unstable low level. To stabilize the low level more quickly and accelerate the non-stationary stage, we propose a novel HRL method: Hierarchical Reinforcement Learning with Low-level Ensemble (HRL2E). In HRL2E, the high level generates goals as high-level actions based on current states. Then the low level made up of several homogeneous policies attempts to complete these goals within a specific timestep budget. The improvement of our approach to the general goal-conditioned HRL algorithms can be summarized in two aspects. First, we estimate the target value function with the ensemble, stabilizing the training process. Second, we propose the Gates module composed of several scoring machines to score each low-level policy and judge which one has the most success potential to execute a specific goal. We adopt Twin Delayed Deep Deterministic Policy Gradient (TD3) in each level. Experimental comparison between our method and state-of-the-art goal-conditioned HRL methods on challenging continuous control tasks in MuJoCo domains shows our method can significantly accelerate training.
What problem does this paper attempt to address?