Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration

Xingrui Yu,Zhenglin Wan,David Mark Bossens,Yueming Lyu,Qing Guo,Ivor W. Tsang
2024-11-11
Abstract:Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the challenge of learning diverse and high - performance behaviors from limited demonstration data. Traditional Imitation Learning (IL) methods usually can only learn a single specific behavior, even in the case of having multiple demonstrations. Therefore, when faced with tasks that require learning diverse behaviors, these methods often perform poorly. Specifically, the paper points out and attempts to solve the following two main problems: 1. **Training Instability**: Adversarial Imitation Learning (AIL) methods (such as GAIL) are prone to instability during the training process, resulting in performance that is not as expected. 2. **Behavior Over - fitting**: When the demonstration data contains only a few behavior patterns, adversarial imitation learning methods may over - fit these known behavior patterns, and thus fail to guide the agent to learn more diverse behaviors. To solve these problems, the paper proposes the Wasserstein Quality Diversity Imitation Learning (WQDIL) method and combines it with the Single - Step Archive Exploration (SSAE) strategy. By introducing the Wasserstein Auto - Encoder (WAE) and a metric - conditioned reward function, WQDIL can train the reward model more stably and encourage the agent to explore a more diverse behavior space, thereby achieving higher diversity and performance. In summary, the main contributions of this paper include: - Pointing out the training instability and behavior over - fitting problems in Adversarial QDIL. - Proposing the WQDIL method to improve training stability by applying Wasserstein adversarial training in the latent space of WAE. - Designing a metric - conditioned reward function and an exploration reward mechanism to alleviate the behavior over - fitting problem and promote the agent to explore more diverse behaviors. Through these improvements, WQDIL significantly outperforms existing imitation learning methods on continuous control tasks in the MuJoCo environment and can learn diverse and high - performance strategies from limited demonstration data.