A Perturbation-Based Policy Distillation Framework with Generative Adversarial Nets

Lihua Zhang,Quan Liu,Xiongzhen Zhang,Yapeng Xu
DOI: https://doi.org/10.1109/icassp49357.2023.10096272
2023-01-01
Abstract:We study the problem of imitation learning in automated decision systems, in which a learner is trained to imitate an expert demonstrator. A widely used method is adversarial imitation learning that alternately optimizes a generator (learner) and a discriminator (reward function). However, the discriminator is biased during the initial and intermediate training stages. Consequently, the gradient descent direction of the learner is misguided, which leads to unstable training and sample complexity. In this paper, we propose deep imitation learning through a guidance-based policy distillation (GIL) algorithm. First, GIL proposes a teacher model, the guidance-based variational autoencoder, which is pre-trained with expert demonstrations. Then, GIL proposes a perturbation-based policy distillation method that uses the teacher model to guide the learner in the correct optimization direction, enabling the learner to imitate the expert policy with fewer detours. The experimental results show that our approach achieve higher sample efficiency compared with multiple baselines.
What problem does this paper attempt to address?