RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation
Junting Chen,Yao Mu,Qiaojun Yu,Tianming Wei,Silang Wu,Zhecheng Yuan,Zhixuan Liang,Chao Yang,Kaipeng Zhang,Wenqi Shao,Yu Qiao,Huazhe Xu,Mingyu Ding,Ping Luo
2024-01-01
Abstract:Rapid progress in high-level task planning and code generation for open-world
robot manipulation has been witnessed in Embodied AI. However, previous studies
put much effort into general common sense reasoning and task planning
capabilities of large-scale language or multi-modal models, relatively little
effort on ensuring the deployability of generated code on real robots, and
other fundamental components of autonomous robot systems including robot
perception, motion planning, and control. To bridge this “ideal-to-real” gap,
this paper presents RobotScript, a platform for 1) a deployable robot
manipulation pipeline powered by code generation; and 2) a code generation
benchmark for robot manipulation tasks in free-form natural language. The
RobotScript platform addresses this gap by emphasizing the unified interface
with both simulation and real robots, based on abstraction from the Robot
Operating System (ROS), ensuring syntax compliance and simulation validation
with Gazebo. We demonstrate the adaptability of our code generation framework
across multiple robot embodiments, including the Franka and UR5 robot arms, and
multiple grippers. Additionally, our benchmark assesses reasoning abilities for
physical space and constraints, highlighting the differences between GPT-3.5,
GPT-4, and Gemini in handling complex physical interactions. Finally, we
present a thorough evaluation on the whole system, exploring how each module in
the pipeline: code generation, perception, motion planning, and even object
geometric properties, impact the overall performance of the system.