Abstract:Generalist robot manipulators need to learn a wide variety of manipulation skills across diverse environments. Current robot training pipelines rely on humans to provide kinesthetic demonstrations or to program simulation environments and to code up reward functions for reinforcement learning. Such human involvement is an important bottleneck towards scaling up robot learning across diverse tasks and environments. We propose Generation to Simulation (Gen2Sim), a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions using large pre-trained generative models of language and vision. We generate 3D assets for simulation by lifting open-world 2D object-centric images to 3D using image diffusion models and querying LLMs to determine plausible physics parameters. Given URDF files of generated and human-developed assets, we chain-of-thought prompt LLMs to map these to relevant task descriptions, temporal decompositions, and corresponding python reward functions for reinforcement learning. We show Gen2Sim succeeds in learning policies for diverse long horizon tasks, where reinforcement learning with non temporally decomposed reward functions fails. Gen2Sim provides a viable path for scaling up reinforcement learning for robot manipulators in simulation, both by diversifying and expanding task and environment development, and by facilitating the discovery of reinforcement-learned behaviors through temporal task decomposition in RL. Our work contributes hundreds of simulated assets, tasks and demonstrations, taking a step towards fully autonomous robotic manipulation skill acquisition in simulation.

GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation

A Human-Robot Collaboration System for Object Handover

Learning Human-to-Robot Dexterous Handovers for Anthropomorphic Hand

Robot-To-Human Handover with Obstacle Avoidance Via Continuous Time Recurrent Neural Network

Human-to-Robot Handover Based on Reinforcement Learning

Leveraging Semantic and Geometric Information for Zero-Shot Robot-to-Human Handover

Visualizing Robot Intent for Object Handovers with Augmented Reality

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration

Learning Dynamic Robot-to-Human Object Handover from Human Feedback

Object-Independent Human-to-Robot Handovers Using Real Time Robotic Vision

ContactHandover: Contact-Guided Robot-to-Human Object Handover

Learning Generalizable 3D Manipulation With 10 Demonstrations

H2O: A Benchmark for Visual Human-human Object Handover Analysis

A Wearable Robotic Hand for Hand-over-Hand Imitation Learning

Dynamic Handover: Throw and Catch with Bimanual Hands

Learning Generalizable Dexterous Manipulation from Human Grasp Affordance

DexH2R: Task-oriented Dexterous Manipulation from Human to Robots

Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models

Fed-HANet: Federated Visual Grasping Learning for Human Robot Handovers

Hand-Object Interaction Pretraining from Videos