Abstract:Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective is on-policy and DAC's ad-hoc application of off-policy training does not guarantee successful imitation (Kostrikov et al., 2019; 2020). Follow-up work such as ValueDICE (Kostrikov et al., 2020) tackles this issue by deriving a fully off-policy AIL objective. Instead in this work, we develop a novel and principled AIL algorithm via the framework of boosting. Like boosting, our new algorithm, AILBoost, maintains an ensemble of properly weighted weak learners (i.e., policies) and trains a discriminator that witnesses the maximum discrepancy between the distributions of the ensemble and the expert policy. We maintain a weighted replay buffer to represent the state-action distribution induced by the ensemble, allowing us to train discriminators using the entire data collected so far. In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework. Empirically, we evaluate our algorithm on both controller state-based and pixel-based environments from the DeepMind Control Suite. AILBoost outperforms DAC on both types of environments, demonstrating the benefit of properly weighting replay buffer data for off-policy training. On state-based environments, DAC outperforms ValueDICE and IQ-Learn (Gary et al., 2021), achieving competitive performance with as little as one expert trajectory.

Learning Unbiased Rewards with Mutual Information in Adversarial Imitation Learning

Addressing Implicit Bias in Adversarial Imitation Learning with Mutual Information.

Reward Function Shape Exploration in Adversarial Imitation Learning: an Empirical Study

Support-weighted Adversarial Imitation Learning

Addressing reward bias in Adversarial Imitation Learning with neutral reward functions

Ranking-Based Generative Adversarial Imitation Learning

Robust Adversarial Imitation Learning Via Adaptively-Selected Demonstrations

Efficient Off-policy Adversarial Imitation Learning with Imperfect Demonstrations

Improve generated adversarial imitation learning with reward variance regularization

Cosine Similarity Based Representation Learning for Adversarial Imitation Learning

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Adversarial Imitation Learning via Boosting

Adversarial imitation learning with mixed demonstrations from multiple demonstrators

Auto-Encoding Adversarial Imitation Learning

On Generalization of Adversarial Imitation Learning and Beyond

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

DiffAIL: Diffusion Adversarial Imitation Learning

Model-based Adversarial Imitation Learning from Demonstrations and Human Reward

On Computation and Generalization of Generative Adversarial Imitation Learning.

Provably Efficient Adversarial Imitation Learning with Unknown Transitions

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation