Abstract:Humans are able to seamlessly visually imitate others, by inferring their intentions and using past experience to achieve the same end goal. In other words, we can parse complex semantic knowledge from raw video and efficiently translate that into concrete motor control. Is it possible to give a robot this same capability? Prior research in robot imitation learning has created agents which can acquire diverse skills from expert human operators. However, expanding these techniques to work with a single positive example during test time is still an open challenge. Apart from control, the difficulty stems from mismatches between the demonstrator and robot domains. For example, objects may be placed in different locations (e.g. kitchen layouts are different in every house). Additionally, the demonstration may come from an agent with different morphology and physical appearance (e.g. human), so one-to-one action correspondences are not available. This paper investigates techniques which allow robots to partially bridge these domain gaps, using their past experience. A neural network is trained to mimic ground truth robot actions given context video from another agent, and must generalize to unseen task instances when prompted with new videos during test time. We hypothesize that our policy representations must be both context driven and dynamics aware in order to perform these tasks. These assumptions are baked into the neural network using the Transformers attention mechanism and a self-supervised inverse dynamics loss. Finally, we experimentally determine that our method accomplishes a $\sim 2$x improvement in terms of task success rate over prior baselines in a suite of one-shot manipulation tasks.

Towards Learning to Imitate from a Single Video Demonstration

Relational Mimic for Visual Adversarial Imitation Learning

A Dual Approach to Imitation Learning from Observations with Offline Datasets

Robotic Imitation of Human Actions

Transformers for One-Shot Visual Imitation

MimicPlay: Long-Horizon Imitation Learning by Watching Human Play

Imitation and Adaptation Based on Consistency: A Quadruped Robot Imitates Animals from Videos Using Deep Reinforcement Learning

Visual Imitation Made Easy

Manipulator-Independent Representations for Visual Imitation

Learning to Play by Imitating Humans

Robust Imitation of a Few Demonstrations with a Backwards Model

Zero-shot Imitation Policy via Search in Demonstration Dataset

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

Robotic Action-state Evaluation Via Siamese Neural Network

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space

Imitation Learning of Robotic Arm with Hierarchical Training Based on Human Videos

Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

Co-Imitation: Learning Design and Behaviour by Imitation

One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers