Abstract:Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently learn complex long - horizon tasks from few demonstrations in robotic manipulation tasks and be able to generalize to different environments, object instances and object positions. Specifically, the paper proposes a method named TAPAS - GMM (Task Auto - Parameterized And Skill Segmented Gaussian Mixture Model), aiming to overcome several key challenges in existing methods: 1. **Modeling of velocity information**: Traditional Gaussian Mixture Models (GMMs) have difficulty effectively handling the end - effector velocities of robots because velocity information usually exists in the form of direction and magnitude, and these are non - Euclidean data and are not suitable for modeling with standard GMMs. To this end, the paper proposes to decompose the end - effector velocity into its direction and magnitude and use Riemannian Gaussian Mixture Models (Riemannian GMMs) for modeling. 2. **Skill segmentation and serialization**: Complex task demonstrations usually contain multiple skills, and the time alignment of these skills in different demonstrations is poor, resulting in a complex data distribution. The paper segments and serializes skills by using the decomposed velocity information, thereby achieving better time alignment, reducing the data dimension, and improving the learning efficiency of the model. 3. **Automatic extraction and selection of task parameters**: Automatically extracting task parameters from visual observations is an open problem, and existing methods rely on infrared motion - tracking devices, which limit the application range of the model. The paper proposes a method for generating candidate task parameters based on visual features and combines statistical methods to select task parameters related to each skill, thereby achieving effective management of the information required for different skills. 4. **Sample efficiency and generalization ability**: TAPAS - GMM can learn complex manipulation tasks with only 5 demonstrations and shows excellent generalization ability in different environments, object instances and object positions. In addition, the learned skills can be flexibly recombined to solve new tasks. The paper demonstrates the superior performance of TAPAS - GMM in terms of sample efficiency, policy learning and generalization ability through extensive experimental evaluations on the RLBench benchmark test set. The experimental results show that TAPAS - GMM achieves state - of - the - art performance on multiple complex tasks and improves the sample efficiency by 20 times compared with existing methods.

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

Task-Parameterized Imitation Learning with Time-Sensitive Constraints

Enhanced Task Parameterized Dynamic Movement Primitives by GMM to Solve Manipulation Tasks

Learning Generalizable 3D Manipulation With 10 Demonstrations

Learning Task Priorities from Demonstrations

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation

A Task-Learning Strategy for Robotic Assembly Tasks from Human Demonstrations

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

Learning Forceful Manipulation Skills from Multi-modal Human Demonstrations

Learning Cooperative Dynamic Manipulation Skills from Human Demonstration Videos

Effective Learning and Online Modulation for Robotic Variable Impedance Skills.

Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models

Learning Multi-Step Manipulation Tasks from A Single Human Demonstration

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

A Task Learning Mechanism for the Telerobots

Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration

Concept2Robot: Learning Manipulation Concepts from Instructions and Human Demonstrations