Abstract:Following the impressive capabilities of in-context learning with large transformers, In-Context Imitation Learning (ICIL) is a promising opportunity for robotics. We introduce Instant Policy, which learns new tasks instantly (without further training) from just one or two demonstrations, achieving ICIL through two key components. First, we introduce inductive biases through a graph representation and model ICIL as a graph generation problem with a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions. Second, we show that such a model can be trained using pseudo-demonstrations - arbitrary trajectories generated in simulation - as a virtually infinite pool of training data. Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks. Code and videos are available at <a class="link-external link-https" href="https://www.robot-learning.uk/instant-policy" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the In - Context Imitation Learning (ICIL) problem in robot learning. Specifically, the author proposes a new method named **Instant Policy**, which enables the robot to immediately learn to perform new tasks after one or two demonstrations without further training. #### Main challenges: 1. **Limited data**: Unlike the fields of natural language processing and computer vision, robot learning lacks large - scale and diverse data sets. 2. **High cost of data collection**: Manual collection of robot data is both time - consuming and expensive. #### Solutions: 1. **Graph representation**: By introducing graph representation, the demonstrations, current point - cloud observations and the robot's actions are integrated into a unified graph space, thus modeling ICIL as a diffusion - based graph generation problem. This enables the model to effectively interpret the demonstrations and observations to predict the robot's actions. 2. **Pseudo - demonstrations**: Arbitrary trajectories generated in the simulation environment are used as training data. These pseudo - demonstrations provide an almost unlimited data source, thus solving the problem of data collection. #### Specific goals: - **Instant learning**: The robot can immediately learn new tasks with only one or two demonstrations at the test time without additional training. - **Efficient learning**: Improve learning efficiency through structured graph representation and the diffusion process. - **Generalization ability**: The model can not only perform well on known tasks, but also generalize to unseen tasks and object geometries. #### Experimental verification: - **Simulation experiments**: Experiments on 24 tasks were carried out on the RLBench platform, demonstrating the performance of Instant Policy on various daily tasks. - **Actual experiments**: Experiments were carried out in the real world to verify the effectiveness and generalization ability of the model. Through these methods, Instant Policy significantly improves the robot's instant learning ability and success rate on new tasks and shows better performance compared to existing methods.

Instant Policy: In-Context Imitation Learning via Graph Diffusion

Few-Shot In-Context Imitation Learning via Implicit Graph Alignment

In-Context Imitation Learning via Next-Token Prediction

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

In-context Exploration-Exploitation for Reinforcement Learning

Invariant Causal Imitation Learning for Generalizable Policies

Efficient Robot Skill Learning with Imitation from a Single Video for Contact-Rich Fabric Manipulation

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Transformers for One-Shot Visual Imitation

EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics

Learning One-Shot Imitation From Humans Without Humans

Robotic Imitation of Human Actions

IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

ReIL: A Framework for Reinforced Intervention-based Imitation Learning

Unpacking the Individual Components of Diffusion Policy

ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI