PoCo: Policy Composition from and for Heterogeneous Robot Learning

Lirui Wang,Jialiang Zhao,Yilun Du,Edward H. Adelson,Russ Tedrake
2024-02-04
Abstract:Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See
Robotics,Machine Learning
What problem does this paper attempt to address?
The paper investigates the challenges of dealing with heterogeneous data in robotic learning, particularly when the data comes from different modalities (such as color, depth, and tactile information) and domains (such as simulation, real robots, and human videos). Current methods typically consolidate all the data into one domain to train a single policy to handle the diversity of tasks and domains, but this approach is both expensive and difficult. The paper proposes a method called "Policy Composition" (PoCo), which combines information from different modalities and domains using diffusion models to learn scene-level and task-level manipulation skills. PoCo can be used for multitask manipulation and combines with an analysis cost function during inference to adapt policy behavior. PoCo instantiates each policy using trajectory-level diffusion models, allowing them to be combined through score predictions. This approach enables the combination of task-level, behavior-level, and domain-level information from various data sources (such as simulation, human, and real robot data) without the need for retraining, thereby achieving generalization to new scenes and tasks. In both simulation and real-world experiments, PoCo demonstrates robust and dexterous performance in tool-use tasks, surpassing baseline methods using a single data source. Compared to approaches that learn general policies solely through representation learning or large-scale data pooling, PoCo does not require extensive data engineering to align observation and action spaces. Instead, it modularly learns individual policies on separate data domains. Additionally, it can quickly adapt to new data sources or tasks by training additional policies without discarding the information from previous tasks. PoCo also allows for arbitrary combinations of aspects of policies, including unseen combinations during training, and performance can be further enhanced through the integration of policy instances. The main contributions of the paper include: (a) proposing the PoCo framework, which combines information from different domains and modalities using probabilistic composition of diffusion models; (b) developing task-level, behavior-level, and domain-level temporal combinations to construct complex composite policies; (c) demonstrating the scene-level and task-level generalization abilities of PoCo in four different tool-use tasks in both simulation and real-world settings, as well as its robust and dexterous behavior in scenes with disturbances and obstructions.