ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera

Jun Shi,Yong A,Yixiang Jin,Dingzhe Li,Haoyu Niu,Zhezhu Jin,He Wang
2024-05-09
Abstract:In this paper, we tackle the problem of grasping transparent and specular objects. This issue holds importance, yet it remains unsolved within the field of robotics due to failure of recover their accurate geometry by depth cameras. For the first time, we propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera. ASGrasp utilizes a two-layer learning-based stereo network for the purpose of transparent object reconstruction, enabling material-agnostic object grasping in cluttered environments. In contrast to existing RGB-D based grasp detection methods, which heavily depend on depth restoration networks and the quality of depth maps generated by depth cameras, our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction. We create an extensive synthetic dataset through domain randomization, which is based on GraspNet-1Billion. Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping in both simulation and the real via seamless sim-to-real transfer. Our method significantly outperforms SOTA networks and even surpasses the performance upper bound set by perfect visible point cloud inputs.Project page:
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the issue of robotic grasping of transparent and specular objects. This problem is of significant importance in the field of robotics but has not been effectively solved yet, mainly because existing depth cameras cannot accurately recover the geometry of these objects. Specifically, the paper proposes a method called ASGrasp, which uses an RGB-D active stereo camera for 6 degrees of freedom (6-DoF) grasp detection. ASGrasp reconstructs the geometry of transparent objects by leveraging a two-layer learning-based stereo network, enabling material-agnostic object grasping that works effectively even in cluttered environments. ### Main Contributions 1. **Proposed a novel RGB-aware two-layer stereo network**: This network can reconstruct general transparent objects from an RGB-D active stereo camera. 2. **Achieved over 90% success rate for the first time**: In both simulation and real-world scenarios, ASGrasp can achieve a high success rate in grasping general transparent objects without any real training data. ### Method Overview - **Scene Reconstruction Module**: Utilizes RGB images and left-right infrared images, aligns the infrared images to the RGB reference coordinate system through differentiable bilinear sampling to construct a cost volume, and uses a GRU network for stereo matching. Additionally, a second-layer depth branch is introduced to predict the depth of invisible parts. - **Grasp Detection Module**: Based on the two-stage grasping network GSNet, it uses rich point cloud information to predict more accurate grasping poses. ### Experimental Results - **Depth Completion Experiments**: On the DREDS-CatKnown and STD-GraspNet test sets, ASGrasp performs best in first-layer depth completion and also shows excellent performance in second-layer depth completion. - **Grasping Performance Experiments**: In both simulation and real-world scenarios, ASGrasp significantly outperforms existing methods, especially when handling transparent objects, achieving a success rate of over 90%. ### Conclusion The paper proposes a 6-DoF grasping method ASGrasp based on an active stereo camera, which effectively solves the problem of grasping transparent and specular objects. By using a two-layer learning-based stereo network and a large-scale synthetic dataset, ASGrasp achieves significant improvements in both depth completion and grasping performance.