Abstract:Expert demonstrations in imitation learning often contain different behavioral modes, e.g., driving modes such as driving on the left, keeping the lane, and driving on the right in the driving tasks. Although most existing multi-modal imitation learning methods allow learning from demonstrations of multiple modes, they have strict constraints on the data of each mode, generally requiring a near data ratio of all modes. Otherwise, it tends to fall into a mode collapse or only learn the data distribution of the mode that has the largest data volume. To address the problem, an algorithm that balances real-fake loss and classification loss by modifying the output of the discriminator, referred to as BAlanced Generative Adversarial Imitation Learning (BAGAIL), is proposed. With this modification, the generator is only rewarded for generating real trajectories with correct modes. BAGAIL is therefore able to deal with imbalanced expert demonstrations and carry out efficient learning for each mode. The learning process of BAGAIL is divided into a pre-training stage and an imitation learning stage. During the pre-training stage, BAGAIL initializes the generator parameters by means of conditional Behavioral Cloning, laying the foundation for the direction of parameter optimization. During the imitation learning stage, BAGAIL optimizes the parameters by using the adversary between the generator and the modified discriminator so that the finally obtained policy can successfully learn the distribution of imbalanced expert data. The experiments showed that BAGAIL accurately distinguished different behavioral modes with imbalanced demonstrations. What is more, the learning result of each mode is close to the expert standard and more stable than other multi-modal imitation learning methods.

BAGAIL: Multi-modal imitation learning from imbalanced demonstrations

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations.

Acgail: Imitation Learning About Multiple Intentions With Auxiliary Classifier Gans

Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning

Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Addressing Implicit Bias in Adversarial Imitation Learning with Mutual Information.

Generative Adversarial Imitation Learning from Failed Experiences

Multimodal Adversarially Learned Inference with Factorized Discriminators

Discriminator-Guided Model-Based Offline Imitation Learning

Limited Preference Aided Imitation Learning from Imperfect Demonstrations

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

Curriculum-Based Imitation of Versatile Skills

SS-MAIL: Self-Supervised Multi-Agent Imitation Learning

Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning

Learning Belief Representations for Imitation Learning in POMDPs

How to Leverage Diverse Demonstrations in Offline Imitation Learning

Improve generated adversarial imitation learning with reward variance regularization

Quality Diversity Imitation Learning