IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning

Ryan Hoque,Ajay Mandlekar,Caelan Garrett,Ken Goldberg,Dieter Fox
2024-05-03
Abstract:Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective interventions during policy rollouts. However, collecting a sufficient amount of interventions to cover the distribution of policy mistakes can be burdensome for human operators. We propose IntervenGen (I-Gen), a novel data generation system that can autonomously produce a large set of corrective interventions with rich coverage of the state space from a small number of human interventions. We apply I-Gen to 4 simulated environments and 1 physical environment with object pose estimation error and show that it can increase policy robustness by up to 39x with only 10 human interventions. Videos and more results are available at
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue in robot imitation learning where the learned policy's performance degrades during actual deployment due to distribution differences between training data and real-world operating conditions (i.e., distribution shift). Specifically, when a robot makes decisions based on object pose observations, if these observations are affected by factors such as sensor noise, occlusion, network latency, or model mis-specification, it can lead to inaccurate estimation of key object positions. This results in the robot encountering states not present in the training data, causing poor policy performance. To improve the robustness of policies against such distribution shifts, existing methods include collecting a large amount of demonstration data under various conditions or using interactive imitation learning (e.g., DAgger and its variants) where human operators provide corrective interventions during policy execution. However, both methods have significant human labor costs. The former requires substantial time and resources to collect data, while the latter demands continuous monitoring of the robot's task execution by human operators and intervention when necessary, which is also very time-consuming and labor-intensive. The paper proposes a new data generation system called IntervenGen (I-Gen), which aims to automatically generate a large amount of corrective intervention data from a small number of human interventions to cover a broader state space and policy error distribution. This way, with only a small amount of human intervention, the robustness and performance of the policy can be significantly improved, reducing reliance on human operators. Experimental results show that with only 10 human interventions, I-Gen can increase the policy's robustness by up to 39 times, demonstrating good adaptability and robustness in both simulated and physical environments.