Abstract:Hierarchical reinforcement learning (HRL) is a promising method to extend traditional reinforcement learning to solve more complex tasks. HRL can solve the problems of long-term reward sparsity and credit assignment. However, the existing HRL methods are trained in specific environments and target tasks each time, resulting in low sample utilization. In addition, the low-level sub-policies of the agent will interfere with each other during the migration process, resulting in poor policy stability. Aiming at the issue above, this paper proposes an HRL method, Relabeling and Policy Distillation of Hierarchical Reinforcement Learning (R-PD-HRL), that integrates meta-learning, shared reward relabeling and policy distillation to accelerate the learning speed and improve the policy stability of the agent. In the training process, a reward relabeling module is introduced to act on the experience buffer. Different reward functions are used to relabel the interaction trajectory for the training of other tasks under the same task distribution. At the low-level, policy distillation technology is used to compress the sub-policies of the low-level, and the interference between the policies is reduced while ensuring the correctness of the original low-level sub-policies. Finally, according to different tasks, the high-level policy calls the low-level optimal policy to complete the decision. In both continuous and discrete state-action environments, experimental results show that compared with other methods, the improved sample utilization of this method greatly accelerates the learning speed, and the success rate is as high as 0.6.

$\epsilon$-Invariant Hierarchical Reinforcement Learning for Building Generalizable Policy

Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments

HRL2E: Hierarchical Reinforcement Learning with Low-level Ensemble

Efficient Hierarchical Exploration with an Active Subgoal Generation Strategy.

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Active Hierarchical Exploration with Stable Subgoal Representation Learning

Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications

Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies

Goal-Conditioned Hierarchical Reinforcement Learning with High-Level Model Approximation.

Relabeling and policy distillation of hierarchical reinforcement learning

Adjacency Constraint for Efficient Hierarchical Reinforcement Learning

Temporal-adaptive Hierarchical Reinforcement Learning

Algorithms for Batch Hierarchical Reinforcement Learning

Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

Sub-policy Adaptation for Hierarchical Reinforcement Learning

Learning Invariable Semantical Representation from Language for Extensible Policy Generalization

Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis

Abstract Value Iteration for Hierarchical Reinforcement Learning

Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs