Abstract:Hierarchical reinforcement learning (HRL) is a promising method to extend traditional reinforcement learning to solve more complex tasks. HRL can solve the problems of long-term reward sparsity and credit assignment. However, the existing HRL methods are trained in specific environments and target tasks each time, resulting in low sample utilization. In addition, the low-level sub-policies of the agent will interfere with each other during the migration process, resulting in poor policy stability. Aiming at the issue above, this paper proposes an HRL method, Relabeling and Policy Distillation of Hierarchical Reinforcement Learning (R-PD-HRL), that integrates meta-learning, shared reward relabeling and policy distillation to accelerate the learning speed and improve the policy stability of the agent. In the training process, a reward relabeling module is introduced to act on the experience buffer. Different reward functions are used to relabel the interaction trajectory for the training of other tasks under the same task distribution. At the low-level, policy distillation technology is used to compress the sub-policies of the low-level, and the interference between the policies is reduced while ensuring the correctness of the original low-level sub-policies. Finally, according to different tasks, the high-level policy calls the low-level optimal policy to complete the decision. In both continuous and discrete state-action environments, experimental results show that compared with other methods, the improved sample utilization of this method greatly accelerates the learning speed, and the success rate is as high as 0.6.

Hierarchical Orchestra of Policies

Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning

HCS-R-HER: Hierarchical Reinforcement Learning Based on Cross Subtasks Rainbow Hindsight Experience Replay

Continual Task Learning through Adaptive Policy Self-Composition

Temporal-adaptive Hierarchical Reinforcement Learning

HLifeRL: A Hierarchical Lifelong Reinforcement Learning Framework

Hierarchical reinforcement learning for efficient exploration and transfer

Encoding Primitives Generation Policy Learning for Robotic Arm to Overcome Catastrophic Forgetting in Sequential Multi-Tasks Learning.

Sub-policy Adaptation for Hierarchical Reinforcement Learning

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Hierarchical Reinforcement Learning in Complex 3D Environments

Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning

Offline Hierarchical Reinforcement Learning via Inverse Optimization

Lifetime policy reuse and the importance of task capacity

I Know How: Combining Prior Policies to Solve New Tasks

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Relabeling and policy distillation of hierarchical reinforcement learning

Hierarchical Policy Learning is Sensitive to Goal Space Design

On the benefits of pixel-based hierarchical policies for task generalization

Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies