Instant Policy: In-Context Imitation Learning via Graph Diffusion

Vitalis Vosylius,Edward Johns
2024-11-20
Abstract:Following the impressive capabilities of in-context learning with large transformers, In-Context Imitation Learning (ICIL) is a promising opportunity for robotics. We introduce Instant Policy, which learns new tasks instantly (without further training) from just one or two demonstrations, achieving ICIL through two key components. First, we introduce inductive biases through a graph representation and model ICIL as a graph generation problem with a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions. Second, we show that such a model can be trained using pseudo-demonstrations - arbitrary trajectories generated in simulation - as a virtually infinite pool of training data. Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks. Code and videos are available at <a class="link-external link-https" href="https://www.robot-learning.uk/instant-policy" rel="external noopener nofollow">this https URL</a>.
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the In - Context Imitation Learning (ICIL) problem in robot learning. Specifically, the author proposes a new method named **Instant Policy**, which enables the robot to immediately learn to perform new tasks after one or two demonstrations without further training. #### Main challenges: 1. **Limited data**: Unlike the fields of natural language processing and computer vision, robot learning lacks large - scale and diverse data sets. 2. **High cost of data collection**: Manual collection of robot data is both time - consuming and expensive. #### Solutions: 1. **Graph representation**: By introducing graph representation, the demonstrations, current point - cloud observations and the robot's actions are integrated into a unified graph space, thus modeling ICIL as a diffusion - based graph generation problem. This enables the model to effectively interpret the demonstrations and observations to predict the robot's actions. 2. **Pseudo - demonstrations**: Arbitrary trajectories generated in the simulation environment are used as training data. These pseudo - demonstrations provide an almost unlimited data source, thus solving the problem of data collection. #### Specific goals: - **Instant learning**: The robot can immediately learn new tasks with only one or two demonstrations at the test time without additional training. - **Efficient learning**: Improve learning efficiency through structured graph representation and the diffusion process. - **Generalization ability**: The model can not only perform well on known tasks, but also generalize to unseen tasks and object geometries. #### Experimental verification: - **Simulation experiments**: Experiments on 24 tasks were carried out on the RLBench platform, demonstrating the performance of Instant Policy on various daily tasks. - **Actual experiments**: Experiments were carried out in the real world to verify the effectiveness and generalization ability of the model. Through these methods, Instant Policy significantly improves the robot's instant learning ability and success rate on new tasks and shows better performance compared to existing methods.