Abstract:Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened inter-level communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting inter-level cooperation. Here, we propose a goal-conditioned HRL framework named Guided Cooperation via Model-based Rollout (GCMR), aiming to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics. Firstly, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Secondly, to prevent disruption by the unseen subgoals and states, lower-level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Thirdly, we propose a one-step rollout-based planning, using higher-level critics to guide the lower-level policy. Specifically, we estimate the value of future states of the lower-level policy using the higher-level critic function, thereby transmitting global task information downwards to avoid local pitfalls. These three critical components in GCMR are expected to facilitate inter-level cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG, yields more stable and robust policy improvement compared to various baselines and significantly outperforms previous state-of-the-art algorithms.

Efficient Hierarchical Exploration with an Active Subgoal Generation Strategy.

Active Hierarchical Exploration with Stable Subgoal Representation Learning

Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

$\epsilon$-Invariant Hierarchical Reinforcement Learning for Building Generalizable Policy

Efficient Exploration through Intrinsic Motivation Learning for Unsupervised Subgoal Discovery in Model-Free Hierarchical Reinforcement Learning

Hierarchical reinforcement learning with natural language subgoals

Landmark Guided Active Exploration with State-specific Balance Coefficient

HRL2E: Hierarchical Reinforcement Learning with Low-level Ensemble

Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout

MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Goal-Conditioned Hierarchical Reinforcement Learning with High-Level Model Approximation.

Adjacency Constraint for Efficient Hierarchical Reinforcement Learning

HAC Explore: Accelerating Exploration with Hierarchical Reinforcement Learning