Abstract:Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenge of learning diverse and high - performance behaviors from limited demonstration data. Traditional Imitation Learning (IL) methods usually can only learn a single specific behavior, even in the case of having multiple demonstrations. Therefore, when faced with tasks that require learning diverse behaviors, these methods often perform poorly. Specifically, the paper points out and attempts to solve the following two main problems: 1. **Training Instability**: Adversarial Imitation Learning (AIL) methods (such as GAIL) are prone to instability during the training process, resulting in performance that is not as expected. 2. **Behavior Over - fitting**: When the demonstration data contains only a few behavior patterns, adversarial imitation learning methods may over - fit these known behavior patterns, and thus fail to guide the agent to learn more diverse behaviors. To solve these problems, the paper proposes the Wasserstein Quality Diversity Imitation Learning (WQDIL) method and combines it with the Single - Step Archive Exploration (SSAE) strategy. By introducing the Wasserstein Auto - Encoder (WAE) and a metric - conditioned reward function, WQDIL can train the reward model more stably and encourage the agent to explore a more diverse behavior space, thereby achieving higher diversity and performance. In summary, the main contributions of this paper include: - Pointing out the training instability and behavior over - fitting problems in Adversarial QDIL. - Proposing the WQDIL method to improve training stability by applying Wasserstein adversarial training in the latent space of WAE. - Designing a metric - conditioned reward function and an exploration reward mechanism to alleviate the behavior over - fitting problem and promote the agent to explore more diverse behaviors. Through these improvements, WQDIL significantly outperforms existing imitation learning methods on continuous control tasks in the MuJoCo environment and can learn diverse and high - performance strategies from limited demonstration data.

Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration

Quality Diversity Imitation Learning

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

Quality Diversity for Robot Learning: Limitations and Future Directions

Data Quality in Imitation Learning

Limited Preference Aided Imitation Learning from Imperfect Demonstrations

How to Leverage Diverse Demonstrations in Offline Imitation Learning

Reward-free World Models for Online Imitation Learning

Self-Practice Imitation Learning From Weak Policy

Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning

Extraneousness-Aware Imitation Learning

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms

Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning

Quality-Similar Diversity via Population Based Reinforcement Learning

Iteratively Learning Novel Strategies with Diversity Measured in State Distances

DIDA: Denoised Imitation Learning based on Domain Adaptation