Abstract:The option framework, one of the most promising Hierarchical Reinforcement Learning (HRL) frameworks, is developed based on the Semi-Markov Decision Problem (SMDP) and employs a triple formulation of the option (i.e., an action policy, a termination probability, and an initiation set). These design choices, however, mean that the option framework: 1) has low sample efficiency, 2) cannot use more stable Markov Decision Problem (MDP) based learning algorithms, 3) represents abstract actions implicitly, and 4) is expensive to scale up. To overcome these problems, here we propose a simple yet effective MDP implementation of the option framework: the Skill-Action (SA) architecture. Derived from a novel discovery that the SMDP option framework has an MDP equivalence, SA hierarchically extracts skills (abstract actions) from primary actions and explicitly encodes these knowledge into skill context vectors (embedding vectors). Although SA is MDP formulated, skills can still be temporally extended by applying the attention mechanism to skill context vectors. Unlike the option framework, which requires M action policies for M skills, SA's action policy only needs one decoder to decode skill context vectors into primary actions. Under this formulation, SA can be optimized with any MDP based policy gradient algorithm. Moreover, it is sample efficient, cheap to scale up, and theoretically proven to have lower variance. Our empirical studies on challenging infinite horizon robot simulation environments demonstrate that SA not only outperforms all baselines by a large margin, but also exhibits smaller variance, faster convergence, and good interpretability. On transfer learning tasks, SA also outperforms the other models and shows its advantage on reusing knowledge across tasks. A potential impact of SA is to pave the way for a large scale pre-training architecture in the reinforcement learning area.

Adversarial Option-Aware Hierarchical Imitation Learning.

HILONet: Hierarchical Imitation Learning from Non-Aligned Observations

Online Baum-Welch algorithm for Hierarchical Imitation Learning

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks

Latent Policies for Adversarial Imitation Learning

Acgail: Imitation Learning About Multiple Intentions With Auxiliary Classifier Gans

Transfering Hierarchical Structure with Dual Meta Imitation Learning

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning

Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

GAILPG: Multi-Agent Policy Gradient with Generative Adversarial Imitation Learning

On Generalization of Adversarial Imitation Learning and Beyond

Generative Adversarial Imitation Learning from Failed Experiences

Multi-task Hierarchical Adversarial Inverse Reinforcement Learning

Multi-Level Discovery of Deep Options

Interpretable Generative Adversarial Imitation Learning

The Skill-Action Architecture: Learning Abstract Action Embeddings for Reinforcement Learning

On Computation and Generalization of Generative Adversarial Imitation Learning.

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions