SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation

Haosheng Li,Weixin Mao,Weipeng Deng,Chenyu Meng,Rui Zhang,Fan Jia,Tiancai Wang,Haoqiang Fan,Hongan Wang,Xiaoming Deng
2024-10-14
Abstract:Task-oriented grasping, which involves grasping specific parts of objects based on their functions, is crucial for developing advanced robotic systems capable of performing complex tasks in dynamic environments. In this paper, we propose a training-free framework that incorporates both semantic and geometric priors for zero-shot task-oriented grasp generation. The proposed framework, SegGrasp, first leverages the vision-language models like GLIP for coarse segmentation. It then uses detailed geometric information from convex decomposition to improve segmentation quality through a fusion policy named GeoFusion. An effective grasp pose can be generated by a grasping network with improved segmentation. We conducted the experiments on both segmentation benchmark and real-world robot grasping. The experimental results show that SegGrasp surpasses the baseline by more than 15\% in grasp and segmentation performance.
Robotics
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of **task - oriented grasping**, especially how to make robots grasp according to the functional parts of objects in a dynamic environment. Specifically, the paper proposes a framework named **SegGrasp**, which can generate accurate task - oriented grasping postures by combining semantic and geometric information without training (i.e., zero - sample learning). #### Main problems include: 1. **Requirements for task - oriented grasping**: - Robots need to grasp according to different functional parts of objects. For example, grasp the head of a hammer for driving nails, not other parts. - Such functional grasping is crucial for performing complex tasks, especially in dynamic environments. 2. **Limitations of existing methods**: - **Traditional methods**: Rely on prior knowledge of object grasping learned from collected data, but these methods are difficult to generalize to unseen objects or scenes. - **Methods based on large - language models (LLM)**: Although they have strong generalization ability, they are insufficient in using geometric information, resulting in insufficient grasping accuracy. - **Insufficient use of geometric information**: Some methods fail to fully utilize geometric features, resulting in unstable grasping performance. 3. **Challenges of zero - sample learning**: - How to achieve task - oriented grasping of new objects or scenes without using additional training data. ### Solutions To overcome the above problems, the SegGrasp framework adopts the following strategies: - **Semantically - guided coarse segmentation**: Use vision - language models (such as GLIP and Grounding DINO) for initial coarse segmentation and extract the semantic information of objects. - **Geometrically - guided fine segmentation**: Obtain geometric information through convex decomposition and design a fusion strategy GeoFusion to improve the segmentation quality. - **Grasping posture generation**: Use the improved segmentation results to generate high - quality grasping postures through Contact - GraspNet. Through these methods, SegGrasp achieves higher segmentation and grasping performance than existing methods and performs well in multiple benchmark tests. ### Summary The core problem of this paper is to develop a task - oriented grasping framework that can adapt to new scenarios without additional training. By combining semantic and geometric information, SegGrasp significantly improves the accuracy and robustness of grasping in a zero - sample learning environment.