Abstract:Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See

What problem does this paper attempt to address?

The paper investigates the challenges of dealing with heterogeneous data in robotic learning, particularly when the data comes from different modalities (such as color, depth, and tactile information) and domains (such as simulation, real robots, and human videos). Current methods typically consolidate all the data into one domain to train a single policy to handle the diversity of tasks and domains, but this approach is both expensive and difficult. The paper proposes a method called "Policy Composition" (PoCo), which combines information from different modalities and domains using diffusion models to learn scene-level and task-level manipulation skills. PoCo can be used for multitask manipulation and combines with an analysis cost function during inference to adapt policy behavior. PoCo instantiates each policy using trajectory-level diffusion models, allowing them to be combined through score predictions. This approach enables the combination of task-level, behavior-level, and domain-level information from various data sources (such as simulation, human, and real robot data) without the need for retraining, thereby achieving generalization to new scenes and tasks. In both simulation and real-world experiments, PoCo demonstrates robust and dexterous performance in tool-use tasks, surpassing baseline methods using a single data source. Compared to approaches that learn general policies solely through representation learning or large-scale data pooling, PoCo does not require extensive data engineering to align observation and action spaces. Instead, it modularly learns individual policies on separate data domains. Additionally, it can quickly adapt to new data sources or tasks by training additional policies without discarding the information from previous tasks. PoCo also allows for arbitrary combinations of aspects of policies, including unseen combinations during training, and performance can be further enhanced through the integration of policy instances. The main contributions of the paper include: (a) proposing the PoCo framework, which combines information from different domains and modalities using probabilistic composition of diffusion models; (b) developing task-level, behavior-level, and domain-level temporal combinations to construct complex composite policies; (c) demonstrating the scene-level and task-level generalization abilities of PoCo in four different tool-use tasks in both simulation and real-world settings, as well as its robust and dexterous behavior in scenes with disturbances and obstructions.

PoCo: Policy Composition from and for Heterogeneous Robot Learning

Efficient Data Collection for Robotic Manipulation via Compositional Generalization

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks

Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion

Polybot: Training One Policy Across Robots While Embracing Variability

Policy composition in reinforcement learning via multi-objective policy optimization

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies

Policy Stitching: Learning Transferable Robot Policies

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes

Robot Fleet Learning via Policy Merging

CompoSuite: A Compositional Reinforcement Learning Benchmark

Transferring Foundation Models for Generalizable Robotic Manipulation

Towards Task-Prioritized Policy Composition

MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies

Composable Deep Reinforcement Learning for Robotic Manipulation

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

A Hierarchical Compliance-Based Contextual Policy Search for Robotic Manipulation Tasks With Multiple Objectives

Multi-task Learning with Gradient Guided Policy Specialization