Abstract:Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks by allowing them to learn from human demonstrations collected via teleoperation, but has mostly been limited to single-arm manipulation. However, many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk. Unfortunately, applying IL to multi-arm manipulation tasks has been challenging -- asking a human to control more than one robotic arm can impose significant cognitive burden and is often only possible for a maximum of two robot arms. To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks. Using MART, we collected demonstrations for five novel two and three-arm tasks from several geographically separated users. From our data we arrived at a critical insight: most multi-arm tasks do not require global coordination throughout its full duration, but only during specific moments. We show that learning from such data consequently presents challenges for centralized agents that directly attempt to model all robot actions simultaneously, and perform a comprehensive study of different policy architectures with varying levels of centralization on our tasks. Finally, we propose and evaluate a base-residual policy framework that allows trained policies to better adapt to the mixed coordination setting common in multi-arm manipulation, and show that a centralized policy augmented with a decentralized residual model outperforms all other models on our set of benchmark tasks. Additional results and videos at <a class="link-external link-https" href="https://roboturk.stanford.edu/multiarm" rel="external noopener nofollow">this https URL</a> .

Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

Vision-Based Robotic Object Grasping—A Deep Reinforcement Learning Approach

Know Thyself: Transferable Visual Control Policies Through Robot-Awareness

View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

Affordance-Centric Policy Learning: Sample Efficient and Generalisable Robot Policy Learning using Affordance-Centric Task Frames

Self-Supervised Learning of Multi-Object Keypoints for Robotic Manipulation

VIEW: Visual Imitation Learning with Waypoints

Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration

Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer

Learning Multi-Arm Manipulation Through Collaborative Teleoperation

Manipulate by Seeing: Creating Manipulation Controllers from Pre-Trained Representations

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning