GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping

Chao Tang,Dehao Huang,Wenqi Ge,Weiyu Liu,Hong Zhang
2023-09-20
Abstract:Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the generalization to novel concepts out of the pre-defined sets. To address this issue, we propose GraspGPT, a large language model (LLM) based TOG framework that leverages the open-end semantic knowledge from an LLM to achieve zero-shot generalization to novel concepts. We conduct experiments on Language Augmented TaskGrasp (LA-TaskGrasp) dataset and demonstrate that GraspGPT outperforms existing TOG methods on different held-out settings when generalizing to novel concepts out of the training set. The effectiveness of GraspGPT is further validated in real-robot experiments. Our code, data, appendix, and video are publicly available at <a class="link-external link-https" href="https://sites.google.com/view/graspgpt/" rel="external noopener nofollow">this https URL</a>.
Robotics
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address a key issue in Task-Oriented Grasping (TOG), which is how to predict grasp actions that can accomplish subsequent operational tasks when faced with unseen concepts. Existing methods typically model the complex relationships between objects, tasks, and grasps by incorporating semantic knowledge as prior information into the TOG process. However, these methods are often based on a closed-world set of concepts, limiting their ability to generalize to new concepts. To overcome this limitation, this paper proposes GraspGPT, a TOG framework based on a large-scale language model (LLM). GraspGPT leverages the open semantic knowledge in LLMs to achieve zero-shot generalization to unseen new concepts. Specifically, when a user provides a natural language instruction containing new concepts, GraspGPT prompts the LLM to generate language description paragraphs about these new concepts, connecting them to related concepts described during training. This enables the robot to extend learned TOG skills from known concepts to new ones. ### Main Contributions 1. **Proposing GraspGPT**: A TOG framework based on LLM that utilizes open semantic knowledge to achieve zero-shot generalization to unseen new concepts. 2. **Constructing the LA-TaskGrasp Dataset**: A language-augmented TOG dataset containing automatically generated concept language descriptions, used to evaluate the performance of GraspGPT. ### Experimental Validation - **Perception Experiments**: Experiments conducted on the LA-TaskGrasp dataset show that GraspGPT outperforms existing TOG methods in terms of generalization performance under different settings (e.g., unseen object categories and tasks). - **Real Robot Experiments**: GraspGPT is deployed on the Kinova Gen3 robotic arm to verify its effectiveness in real-world applications. Experimental results demonstrate that GraspGPT excels in task-oriented grasping and tool manipulation. ### Conclusion By leveraging the open semantic knowledge in LLMs, GraspGPT successfully addresses the generalization problem faced by existing methods when dealing with unseen new concepts, providing a new solution for task-oriented grasping.