Imitation Learning from Suboptimal Demonstrations Via Discriminator Weighting

Liuyuan He,Guoyu Zuo,Jiangeng Li,Shuangyue Yu
DOI: https://doi.org/10.1109/ccdc62350.2024.10588080
2024-01-01
Abstract:Imitation learning algorithms for robotics applications require sufficient optimal data to learn well-performing strategies. State-of-the-art approaches utilize pre-labeled data or interaction with the environment to filter suboptimal data, which is time-consuming and laborious in reality. In this paper, we propose a new approach that avoids manual labeling or environment interaction. We design an additional discriminator for the behavioral cloning approach to distinguish the optimal and suboptimal data in order to influence policy learning and avoid suboptimal behaviors. Within this framework, we design a new imitation learning algorithm that utilizes the output of the discriminator as weights to learn efficiently on datasets containing suboptimal data. We evaluate the performance of the proposed method in four environments and compare it with three benchmark methods. The results illustrate that our method has better performance when dealing with datasets containing suboptimal data. The method we proposed can distinguish data with higher values in the dataset and enable the agent to learn high-performance policy from imperfect demonstrations or a small amount of data.
What problem does this paper attempt to address?