GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF

Qiyu Dai,Yan Zhu,Yiran Geng,Ciyu Ruan,Jiazhao Zhang,He Wang
DOI: https://doi.org/10.48550/arXiv.2210.06575
2023-03-16
Abstract:In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras in sensing their geometry. We, for the first time, propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, that leverages the generalizable neural radiance field (NeRF) to achieve material-agnostic object grasping in clutter. Compared to the existing NeRF-based 3-DoF grasp detection methods that rely on densely captured input images and time-consuming per-scene optimization, our system can perform zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF grasps, both in real-time. The proposed framework jointly learns generalizable NeRF and grasp detection in an end-to-end manner, optimizing the scene representation construction for the grasping. For training data, we generate a large-scale photorealistic domain-randomized synthetic dataset of grasping in cluttered tabletop scenes that enables direct transfer to the real world. Our extensive experiments in synthetic and real-world environments demonstrate that our method significantly outperforms all the baselines in all the experiments while remaining in real-time. Project page can be found at <a class="link-external link-https" href="https://pku-epic.github.io/GraspNeRF" rel="external noopener nofollow">this https URL</a>
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform 6 - degrees - of - freedom (6 - DoF) grasping detection for transparent and specular objects in vision - based robotic systems. This is an important but challenging problem because depth cameras fail to perceive the geometries of these objects. Specifically, the paper proposes a 6 - DoF grasping detection network named GraspNeRF based on multi - view RGB images, which utilizes generalized neural radiance fields (NeRF) to achieve material - independent object grasping, especially in cluttered environments. ### Main Problems 1. **Grasping Detection of Transparent and Specular Objects**: - Depth cameras perform poorly on transparent and specular objects, resulting in an inability to accurately perceive the geometries of these objects. - Existing methods usually rely on depth images, and these images have missing or incorrect information on transparent and specular objects. 2. **Real - Time Performance and Generalization Ability**: - Existing NeRF - based methods require a large number of input images and long - term scene optimization, and cannot achieve real - time grasping. - These methods need to retrain NeRF when handling multi - object sequential grasping, further increasing the time and computational costs. ### Solutions The paper proposes GraspNeRF, a 6 - DoF grasping detection network based on multi - view RGB images, with the following characteristics: - **Generalized NeRF**: Utilize generalized NeRF (such as MVSNeRF and NeuRay) to aggregate features through multi - view observations without the need for individual training for each scene. - **Sparse Input**: Only a small number (6) of sparse RGB input images are required to achieve real - time grasping. - **End - to - End Learning**: Jointly learn the generalized NeRF and grasping detection, and optimize the construction of scene representations to improve grasping performance. - **Large - Scale Synthetic Dataset**: Generate a large - scale photorealistic domain - randomized synthetic dataset containing 2.4 million images and 100,000 scenes for training the model, enabling it to directly generalize to new real - world scenes. ### Experimental Results - **Simulation Experiments**: - Single - Object Grasping: GraspNeRF has an average success rate of 86.1% in 36 trials, significantly outperforming all baseline methods. - Sequential Clutter Removal: In stacking and arranging scenes, the grasping success rate and clearance rate of GraspNeRF on transparent and specular objects are significantly higher than those of other methods. - **Real - Robot Experiments**: - Single - Object Grasping: GraspNeRF has a success rate of 88.9% in 18 trials, outperforming all baseline methods. - Sequential Clutter Removal: In stacking and arranging scenes, GraspNeRF has the highest grasping success rate and clearance rate on various materials. ### Conclusion GraspNeRF achieves efficient, real - time, and high - precision grasping detection of transparent and specular objects by using generalized NeRF and sparse RGB inputs. This method performs well in both simulation and real - robot experiments, and has strong generalization ability and practical application potential.