Abstract:Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective is on-policy and DAC's ad-hoc application of off-policy training does not guarantee successful imitation (Kostrikov et al., 2019; 2020). Follow-up work such as ValueDICE (Kostrikov et al., 2020) tackles this issue by deriving a fully off-policy AIL objective. Instead in this work, we develop a novel and principled AIL algorithm via the framework of boosting. Like boosting, our new algorithm, AILBoost, maintains an ensemble of properly weighted weak learners (i.e., policies) and trains a discriminator that witnesses the maximum discrepancy between the distributions of the ensemble and the expert policy. We maintain a weighted replay buffer to represent the state-action distribution induced by the ensemble, allowing us to train discriminators using the entire data collected so far. In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework. Empirically, we evaluate our algorithm on both controller state-based and pixel-based environments from the DeepMind Control Suite. AILBoost outperforms DAC on both types of environments, demonstrating the benefit of properly weighting replay buffer data for off-policy training. On state-based environments, DAC outperforms ValueDICE and IQ-Learn (Gary et al., 2021), achieving competitive performance with as little as one expert trajectory.

Off-policy Imitation Learning from Visual Inputs

CEIL: Generalized Contextual Imitation Learning

Robust Visual Imitation Learning with Inverse Dynamics Representations

Extraneousness-Aware Imitation Learning

Keyframe-Focused Visual Imitation Learning

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms

Offline Imitation Learning with Variational Counterfactual Reasoning

Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

Long-Sighted Imitation Learning for Partially Observable Control

Adversarial Imitation Learning from Video using a State Observer

Adversarial Imitation Learning via Boosting

Limited Preference Aided Imitation Learning from Imperfect Demonstrations

Imitation Learning from Observation through Optimal Transport

Visual Imitation Made Easy

Visual Imitation Learning with Calibrated Contrastive Representation

EvIL: Evolution Strategies for Generalisable Imitation Learning

Off-Policy Imitation Learning from Observations