ICGNet: A Unified Approach for Instance-Centric Grasping

René Zurbrügg,Yifan Liu,Francis Engelmann,Suryansh Kumar,Marco Hutter,Vaishakh Patil,Fisher Yu
2024-05-10
Abstract:Accurate grasping is the key to several robotic tasks including assembly and household robotics. Executing a successful grasp in a cluttered environment requires multiple levels of scene understanding: First, the robot needs to analyze the geometric properties of individual objects to find feasible grasps. These grasps need to be compliant with the local object geometry. Second, for each proposed grasp, the robot needs to reason about the interactions with other objects in the scene. Finally, the robot must compute a collision-free grasp trajectory while taking into account the geometry of the target object. Most grasp detection algorithms directly predict grasp poses in a monolithic fashion, which does not capture the composability of the environment. In this paper, we introduce an end-to-end architecture for object-centric grasping. The method uses pointcloud data from a single arbitrary viewing direction as an input and generates an instance-centric representation for each partially observed object in the scene. This representation is further used for object reconstruction and grasp detection in cluttered table-top scenes. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets, indicating superior performance for grasping and reconstruction. Additionally, we demonstrate real-world applicability by decluttering scenes with varying numbers of objects.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the problem of accurately and collision-free grasping of objects by robots in cluttered environments. Specifically, the authors propose a new method, ICGNet (Instance-Centric Grasping Network), to achieve object-level grasping and shape prediction. This method primarily addresses the following issues: 1. **Multi-level Scene Understanding**: - **Geometric Attribute Analysis**: The robot needs to analyze the geometric attributes of individual objects to find feasible grasp points. These grasp points need to match the local geometric structure of the objects. - **Interaction Reasoning**: For each proposed grasp point, the robot needs to consider interactions with other objects. - **Collision-free Trajectory Planning**: The robot must compute a collision-free grasping path while considering the geometric structure of the target object. 2. **Limitations of Existing Methods**: - **Overall Scene Processing**: Most existing grasp detection algorithms predict grasp poses directly from the overall scene, which fails to capture the composability of the environment. - **Complexity and Suboptimal Predictions**: Many methods rely on given segmentation masks, object templates, etc., which add extra complexity and may lead to suboptimal predictions in cases of severe occlusion. 3. **Goal-driven Grasping**: - **Instance-level Grasping**: Existing methods usually do not distinguish between individual object instances, making goal-driven grasping tasks (such as "grasp the 1st object" or "grasp the bottle") difficult. - **Collision Detection**: Existing methods may not ensure collision-free conditions between objects after grasping. ### Solution The authors propose an end-to-end architecture that generates instance-level representations of observed objects from single-viewpoint point cloud data. This representation is further used for object reconstruction and grasp detection. Specifically, the main contributions of ICGNet include: 1. **Instance-level Feature Extraction**: - Using sparse feature volumes, combining voxel and surface features to extract instance-level information for each object. - Refining instance queries through iterative mask cross-attention and self-attention mechanisms, allowing each potential query to focus on specific instances. 2. **Contact Point Grasp Representation**: - Proposing a contact point-based grasp representation method that allows predicting the success probability of different grasp directions, thereby generating more diverse grasp proposals. 3. **Multi-task Joint Prediction**: - Simultaneously predicting the semantic category, instance labels, occupancy values, and grasp possibilities of objects, providing a unified approach to handle scene understanding, reconstruction, and grasp detection. ### Experimental Results The authors validate the effectiveness of ICGNet through extensive experiments on synthetic datasets. The experimental results show that ICGNet outperforms existing methods in terms of Grasp Success Rate (GSR), Decluttering Rate (DR), and reconstruction performance (Chamfer L1 distance and IoU). Additionally, the authors conduct experiments on real-world data, further validating the practical application value of the method. ### Conclusion ICGNet significantly improves the grasping performance of robots in cluttered environments through instance-level grasping and shape prediction, especially excelling in goal-driven grasping tasks. This method not only achieves excellent performance on synthetic datasets but also demonstrates strong practical application capabilities in real-world experiments.