Ranking-Based Generative Adversarial Imitation Learning
Zhipeng Shi,Xuehe Zhang,Yu Fang,Changle Li,Gangfeng Liu,Jie Zhao
DOI: https://doi.org/10.1109/lra.2024.3451360
2024-01-01
Abstract:Inimitation learning, it is often assumed the demonstration data are optimal, even though they are imperfect in practice. The imperfect demonstrations result from expert errors, large-scale demonstration data, and the non-convexity of the solution space of the task. In this letter, we propose a new ranking-based Generative Adversarial Imitation Learning (RB-GAIL) that can deal with the above imperfect datasets by utilizing the generated experiences more efficiently and avoiding the dependency on plenty of different expert demonstrations. We performed a rigorous mathematical analysis, indicating that RB-GAIL can implicitly model the modes of the expert data by weighting multiple discriminators, and a monotonically increasing positive activation function can help the model converge to the global optimal solution. Experimental results show that our method surpasses other baseline methods with imperfect demonstration (ours: increased by 4.7% to the optimal expert level in the Ant task, but Trajectory-ranked Reward Extrapolation (T-REX): decreased by 12.2%, Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning (UID): decreased by 31.01% and Wasserstein Adversarial Imitation Learning (WAIL): dropped by 96.0%). In physical experiments with manipulation, our proposed method achieved a success rate of 100% (WAIL: under 90%).