The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

Jan Ole von Hartz,Tim Welschehold,Abhinav Valada,Joschka Boedecker
2024-10-23
Abstract:Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.
Robotics,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently learn complex long - horizon tasks from few demonstrations in robotic manipulation tasks and be able to generalize to different environments, object instances and object positions. Specifically, the paper proposes a method named TAPAS - GMM (Task Auto - Parameterized And Skill Segmented Gaussian Mixture Model), aiming to overcome several key challenges in existing methods: 1. **Modeling of velocity information**: Traditional Gaussian Mixture Models (GMMs) have difficulty effectively handling the end - effector velocities of robots because velocity information usually exists in the form of direction and magnitude, and these are non - Euclidean data and are not suitable for modeling with standard GMMs. To this end, the paper proposes to decompose the end - effector velocity into its direction and magnitude and use Riemannian Gaussian Mixture Models (Riemannian GMMs) for modeling. 2. **Skill segmentation and serialization**: Complex task demonstrations usually contain multiple skills, and the time alignment of these skills in different demonstrations is poor, resulting in a complex data distribution. The paper segments and serializes skills by using the decomposed velocity information, thereby achieving better time alignment, reducing the data dimension, and improving the learning efficiency of the model. 3. **Automatic extraction and selection of task parameters**: Automatically extracting task parameters from visual observations is an open problem, and existing methods rely on infrared motion - tracking devices, which limit the application range of the model. The paper proposes a method for generating candidate task parameters based on visual features and combines statistical methods to select task parameters related to each skill, thereby achieving effective management of the information required for different skills. 4. **Sample efficiency and generalization ability**: TAPAS - GMM can learn complex manipulation tasks with only 5 demonstrations and shows excellent generalization ability in different environments, object instances and object positions. In addition, the learned skills can be flexibly recombined to solve new tasks. The paper demonstrates the superior performance of TAPAS - GMM in terms of sample efficiency, policy learning and generalization ability through extensive experimental evaluations on the RLBench benchmark test set. The experimental results show that TAPAS - GMM achieves state - of - the - art performance on multiple complex tasks and improves the sample efficiency by 20 times compared with existing methods.