Abstract:Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened inter-level communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting inter-level cooperation. Here, we propose a goal-conditioned HRL framework named Guided Cooperation via Model-based Rollout (GCMR), aiming to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics. Firstly, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Secondly, to prevent disruption by the unseen subgoals and states, lower-level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Thirdly, we propose a one-step rollout-based planning, using higher-level critics to guide the lower-level policy. Specifically, we estimate the value of future states of the lower-level policy using the higher-level critic function, thereby transmitting global task information downwards to avoid local pitfalls. These three critical components in GCMR are expected to facilitate inter-level cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG, yields more stable and robust policy improvement compared to various baselines and significantly outperforms previous state-of-the-art algorithms.

Active Hierarchical Exploration with Stable Subgoal Representation Learning

Efficient Hierarchical Exploration with an Active Subgoal Generation Strategy.

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments

Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

Learning Subgoal Representations with Slow Dynamics

Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Hierarchical reinforcement learning with natural language subgoals

Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

Landmark Guided Active Exploration with State-specific Balance Coefficient

HRL2E: Hierarchical Reinforcement Learning with Low-level Ensemble

Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies

Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

$\epsilon$-Invariant Hierarchical Reinforcement Learning for Building Generalizable Policy

Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis

Goal-Conditioned Hierarchical Reinforcement Learning with High-Level Model Approximation.

HAC Explore: Accelerating Exploration with Hierarchical Reinforcement Learning