Abstract: Imitation learning is a class of promising policy learning algorithms that is free from many practical issues with reinforcement learning, such as the reward design issue and the exploration hardness. However, the current imitation algorithm struggles to achieve both high performance and high in-environment sample efficiency simultaneously. Behavioral Cloning (BC) does not need in-environment interactions, but it suffers from the covariate shift problem which harms its performance. Adversarial Imitation Learning (AIL) turns imitation learning into a distribution matching problem. It can achieve better performance on some tasks but it requires a large number of in-environment interactions. Inspired by the recent success of EfficientZero in RL, we propose EfficientImitate (EI), a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Our algorithmic contribution in this paper is two-fold. First, we extend AIL into the MCTS-based RL. Second, we show the seemingly incompatible two classes of imitation algorithms (BC and AIL) can be naturally unified under our framework, enjoying the benefits of both. We benchmark our method not only on the state-based DeepMind Control Suite, but also on the image version which many previous works find highly challenging. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency. EI shows over 4x gain in performance in the limited sample setting on state-based and image-based tasks and can solve challenging problems like Humanoid, where previous methods fail with small amount of interactions. Our code is available at https://github.com/zhaohengyin/EfficientImitate.

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration

Efficient Off-policy Adversarial Imitation Learning with Imperfect Demonstrations

Planning for Sample Efficient Imitation Learning

Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations?

Learning from demonstrations: An intuitive VR environment for imitation learning of construction robots

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Extraneousness-Aware Imitation Learning

Limited Preference Aided Imitation Learning from Imperfect Demonstrations

An Empirical Investigation of Representation Learning for Imitation

Learning Feasibility to Imitate Demonstrators with Different Dynamics

Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning

Adversarial imitation learning with mixed demonstrations from multiple demonstrators

Robust Visual Imitation Learning with Inverse Dynamics Representations

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms

Zero-Shot Visual Imitation

MimicPlay: Long-Horizon Imitation Learning by Watching Human Play

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

Sample Efficient Imitation Learning via Reward Function Trained in Advance

How to Leverage Diverse Demonstrations in Offline Imitation Learning

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning