Abstract:Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad generalization. Recent works on large-scale robotic data collection typically vary many environmental factors of variation (e.g., object types, table textures) during data collection, to cover a diverse range of scenarios. However, they do not explicitly account for the possible compositional abilities of policies trained on the data. If robot policies can compose environmental factors from their data to succeed when encountering unseen factor combinations, we can exploit this to avoid collecting data for situations that composition would address. To investigate this possibility, we conduct thorough empirical studies both in simulation and on a real robot that compare data collection strategies and assess whether visual imitation learning policies can compose environmental factors. We find that policies do exhibit composition, although leveraging prior robotic datasets is critical for this on a real robot. We use these insights to propose better in-domain data collection strategies that exploit composition, which can induce better generalization than naive approaches for the same amount of effort during data collection. We further demonstrate that a real robot policy trained on data from such a strategy achieves a success rate of 77.5% when transferred to entirely new environments that encompass unseen combinations of environmental factors, whereas policies trained using data collected without accounting for environmental variation fail to transfer effectively, with a success rate of only 2.5%. We provide videos at

What problem does this paper attempt to address?

The paper discusses how to effectively collect data in robot manipulation to achieve broader generalization capabilities, particularly by using compositional generalization to reduce the amount of data required. Currently, although large-scale robot data collection is increasing, it is still uncertain how policies can effectively combine environmental factors to handle unseen combinations. The research compares different data collection strategies through simulations and experiments with real robots, and evaluates whether visual imitation learning policies can combine environmental factors. The paper proposes that if policies can combine environmental factors in their training data, then data collection can be avoided for cases that can be solved through composition. The experiments show that policies do show some ability to combine, but it is necessary to utilize previous datasets on real robots. Based on these findings, the paper proposes the use of composition-aware data collection strategies, which can induce better generalization than traditional methods without increasing the workload of data collection. In practical robot experiments, policies trained using this strategy have a success rate of 77.5% in transferring to completely new environments with unseen combinations of environmental factors, while data collection strategies that do not consider environmental changes only have a success rate of 2.5%. Furthermore, the paper emphasizes the importance of utilizing previous robot datasets to facilitate composition. Overall, this paper addresses the problem of optimizing robot data collection by understanding and utilizing compositional generalization to improve the adaptability and generalization capabilities of robots in different environments.

Efficient Data Collection for Robotic Manipulation via Compositional Generalization

PoCo: Policy Composition from and for Heterogeneous Robot Learning

Data Scaling Laws in Imitation Learning for Robotic Manipulation

CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation

Efficient Self-Supervised Data Collection for Offline Robot Learning

Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias

Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation

Transferring Foundation Models for Generalizable Robotic Manipulation

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

Learning Generalizable 3D Manipulation With 10 Demonstrations

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

UniDoorManip: Learning Universal Door Manipulation Policy Over Large-scale and Diverse Door Manipulation Environments

RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

Evaluating Real-World Robot Manipulation Policies in Simulation

A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents

Efficient Robot Skill Learning with Imitation from a Single Video for Contact-Rich Fabric Manipulation

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation