Abstract:In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. In previous work on continuous control, the sensitivity of methods to this trade-off has not been addressed explicitly, as locomotion provides a suitable prior for navigation tasks, which have been of foremost interest. In this work, we analyze this trade-off for low-level policy pre-training with a new benchmark suite of diverse, sparse-reward tasks for bipedal robots. We alleviate the need for prior knowledge by proposing a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner. For utilization on downstream tasks, we present a three-layered hierarchical learning algorithm to automatically trade off between general and specific skills as required by the respective task. In our experiments, we show that our approach performs this trade-off effectively and achieves better results than current state-of-the-art methods for end- to-end hierarchical reinforcement learning and unsupervised skill discovery. Code and videos are available at <a class="link-external link-https" href="https://facebookresearch.github.io/hsd3" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of how to effectively utilize pre - trained low - level skills in Reinforcement Learning (RL) to promote exploration in complex environments. Specifically, it focuses on how to balance the generality and specificity of skills through the Hierarchical Skill Learning Framework without the need for task - specific prior knowledge, thereby improving exploration efficiency. #### Main problems and challenges 1. **Trade - off in skill design**: - In reinforcement learning, pre - trained low - level skills can greatly promote exploration, but an appropriate balance needs to be found between generality (fine - grained control) and specificity (faster learning). - Previous work usually relies on prior knowledge of navigation tasks (such as the centroid control of robots), which limits its application in other types of tasks. 2. **Skill acquisition without prior knowledge**: - The paper proposes a new benchmark suite that contains multiple sparse - reward tasks for simulating bipedal robots. - To reduce the dependence on prior knowledge, the paper introduces a Hierarchical Skill Learning Framework that can automatically acquire skills of different complexities in an unsupervised manner. 3. **Dynamic skill selection**: - A three - layer hierarchical learning algorithm is proposed, which can automatically select general or specific skills in downstream tasks to meet the requirements of different tasks. - This framework can more flexibly handle diverse tasks and has shown results superior to existing end - to - end hierarchical reinforcement learning methods in experiments. #### Solution overview - **Hierarchical Skill Learning Framework**: Obtain a series of low - level skills of different complexities through unsupervised pre - training. - **Three - layer hierarchical policy**: The top layer selects skills (defines the target space), the middle layer selects specific targets, and the lowest layer performs low - level control. - **New benchmark suite**: Contains multiple sparse - reward tasks for evaluating the effect of hierarchical skill learning. #### Experimental results - **Generality and specificity of skills**: Different low - level skills perform differently in different tasks, and there is no single best skill. - **Advantages of dynamic skill selection**: The three - layer hierarchical strategy performs well on all tasks and even outperforms a single skill optimized for a specific task. - **Comparison with existing methods**: The HSD - 3 method has better performance than existing non - hierarchical and hierarchical reinforcement learning methods on multiple benchmark tasks. In conclusion, this paper solves the problem of how to effectively utilize pre - trained low - level skills to promote exploration without the need for task - specific prior knowledge by proposing a new Hierarchical Skill Learning Framework and demonstrates its superior performance in a variety of complex tasks.

Hierarchical Skills for Efficient Exploration

Unsupervised Discovery of Transitional Skills for Deep Reinforcement Learning

Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

Sub-policy Adaptation for Hierarchical Reinforcement Learning

Adaptive and Explainable Deployment of Navigation Skills via Hierarchical Deep Reinforcement Learning

Hierarchical Reinforcement Learning for Quadruped Locomotion

Hierarchical Reinforcement Learning Integrating With Human Knowledge for Practical Robot Skill Learning in Complex Multi-Stage Manipulation

Creating Multi-Level Skill Hierarchies in Reinforcement Learning

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Hierarchical reinforcement learning with natural language subgoals

Hierarchical reinforcement learning for efficient exploration and transfer

HAC Explore: Accelerating Exploration with Hierarchical Reinforcement Learning

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation

Efficient Hierarchical Exploration with an Active Subgoal Generation Strategy.

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies