Best-in-class Imitation: Non-negative Positive-unlabeled Imitation Learning from Imperfect Demonstrations

Lin Zhang,Fei Zhu,Xinghong Ling,Quan Liu
DOI: https://doi.org/10.1016/j.ins.2022.04.015
IF: 8.1
2022-01-01
Information Sciences
Abstract:Although imitation learning can learn an optimal policy from expert demonstrations, it may fail to be transferred to practical environments because it is difficult to collect high-quality demonstrations for which the ultimate policy is not accurate enough and converges slowly. To solve the problem, an algorithm that utilizes Non-negative Positive-unlabeled learning (nnPU) as the probabilistic classifier to evaluate the quality of demonstrations, referred to as Non-negative Positive-unlabeled Importance Weighting Imitation Learning (PUIWIL), is proposed to increase the utilization of imperfect demonstrations and improve the performance of imitation learning. PUIWIL introduces confidence scores calculated by the nnPU classifier for expert demonstrations, which indicates the probability that the demonstration is generated by an optimal policy, and reweights all expert demonstrations according to their confidence scores. In addition, PUIWIL reconstructs the standard GAIL framework to make high-quality demonstrations have a more significant impact on imitation learning, which is called Best-in-class Imitation. The experiments demonstrate that PUIWIL improves both the performance and robustness of imitation learning from imperfect demonstrations. (C) 2022 Elsevier Inc. All rights reserved.
What problem does this paper attempt to address?