Learning Generalizable Tool-use Skills through Trajectory Generation

Carl Qi,Yilin Wu,Lifan Yu,Haoyue Liu,Bowen Jiang,Xingyu Lin,David Held
2024-04-24
Abstract:Autonomous systems that efficiently utilize tools can assist humans in completing many common tasks such as cooking and cleaning. However, current systems fall short of matching human-level of intelligence in terms of adapting to novel tools. Prior works based on affordance often make strong assumptions about the environments and cannot scale to more complex, contact-rich tasks. In this work, we tackle this challenge and explore how agents can learn to use previously unseen tools to manipulate deformable objects. We propose to learn a generative model of the tool-use trajectories as a sequence of tool point clouds, which generalizes to different tool shapes. Given any novel tool, we first generate a tool-use trajectory and then optimize the sequence of tool poses to align with the generated trajectory. We train a single model on four different challenging deformable object manipulation tasks, using demonstration data from only one tool per task. The model generalizes to various novel tools, significantly outperforming baselines. We further test our trained policy in the real world with unseen tools, where it achieves the performance comparable to human. Additional materials can be found on our project website: <a class="link-external link-https" href="https://sites.google.com/view/toolgen" rel="external noopener nofollow">this https URL</a>.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to address how autonomous systems can adapt and utilize unfamiliar tools to manipulate deformable objects, such as dough, in a flexible manner. The current systems fail to reach human-level intelligence when adapting to new tools. Existing approaches based on affordance often make strong assumptions about the environment and are not suitable for more complex, contact-rich tasks. In this paper, the authors propose a method called ToolGen, which learns a serialized point cloud representation by generating tool usage trajectories, and thus generalizes to tools of different shapes. Given any new tool, ToolGen first generates a tool usage trajectory and then optimizes the tool pose sequence to match the generated trajectory. The model is trained using only demonstration data of a tool for each task, yet it demonstrates significant generalization ability on multiple new tools and performs comparably to humans in the real world. The paper also compares several baseline methods and demonstrates the superior performance of ToolGen in handling different types of tasks, goals, and tools.