Abstract:The option framework, one of the most promising Hierarchical Reinforcement Learning (HRL) frameworks, is developed based on the Semi-Markov Decision Problem (SMDP) and employs a triple formulation of the option (i.e., an action policy, a termination probability, and an initiation set). These design choices, however, mean that the option framework: 1) has low sample efficiency, 2) cannot use more stable Markov Decision Problem (MDP) based learning algorithms, 3) represents abstract actions implicitly, and 4) is expensive to scale up. To overcome these problems, here we propose a simple yet effective MDP implementation of the option framework: the Skill-Action (SA) architecture. Derived from a novel discovery that the SMDP option framework has an MDP equivalence, SA hierarchically extracts skills (abstract actions) from primary actions and explicitly encodes these knowledge into skill context vectors (embedding vectors). Although SA is MDP formulated, skills can still be temporally extended by applying the attention mechanism to skill context vectors. Unlike the option framework, which requires M action policies for M skills, SA's action policy only needs one decoder to decode skill context vectors into primary actions. Under this formulation, SA can be optimized with any MDP based policy gradient algorithm. Moreover, it is sample efficient, cheap to scale up, and theoretically proven to have lower variance. Our empirical studies on challenging infinite horizon robot simulation environments demonstrate that SA not only outperforms all baselines by a large margin, but also exhibits smaller variance, faster convergence, and good interpretability. On transfer learning tasks, SA also outperforms the other models and shows its advantage on reusing knowledge across tasks. A potential impact of SA is to pave the way for a large scale pre-training architecture in the reinforcement learning area.

Meta-Learning Parameterized Skills

Practice Makes Perfect: Planning to Learn Skill Parameter Policies

Autonomous discovery of the goal space to learn a parameterized skill

Learning Task-Parameterized Skills from Few Demonstrations

The Skill-Action Architecture: Learning Abstract Action Embeddings for Reinforcement Learning

Heterogeneous Skill Learning for Multi-agent Tasks

Learning Neuro-Symbolic Skills for Bilevel Planning

Search-Based Task Planning with Learned Skill Effect Models for Lifelong Robotic Manipulation

Learning and generalization of task-parameterized skills through few human demonstrations

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

Meta Learning Shared Hierarchies

LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation

Memory-Based Parameterized Skills Learning for Mapless Visual Navigation.

Combining Model-Based $Q$ -Learning with Structural Knowledge Transfer for Robot Skill Learning

Choreographer: Learning and Adapting Skills in Imagination

Scaling simulation-to-real transfer by learning composable robot skills

Automata-Guided Hierarchical Reinforcement Learning for Skill Composition

MGHRL: Meta Goal-generation for Hierarchical Reinforcement Learning

Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Hierarchical Meta-Reinforcement Learning via Automated Macro-Action Discovery

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning