Abstract:In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras in sensing their geometry. We, for the first time, propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, that leverages the generalizable neural radiance field (NeRF) to achieve material-agnostic object grasping in clutter. Compared to the existing NeRF-based 3-DoF grasp detection methods that rely on densely captured input images and time-consuming per-scene optimization, our system can perform zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF grasps, both in real-time. The proposed framework jointly learns generalizable NeRF and grasp detection in an end-to-end manner, optimizing the scene representation construction for the grasping. For training data, we generate a large-scale photorealistic domain-randomized synthetic dataset of grasping in cluttered tabletop scenes that enables direct transfer to the real world. Our extensive experiments in synthetic and real-world environments demonstrate that our method significantly outperforms all the baselines in all the experiments while remaining in real-time. Project page can be found at <a class="link-external link-https" href="https://pku-epic.github.io/GraspNeRF" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to perform 6 - degrees - of - freedom (6 - DoF) grasping detection for transparent and specular objects in vision - based robotic systems. This is an important but challenging problem because depth cameras fail to perceive the geometries of these objects. Specifically, the paper proposes a 6 - DoF grasping detection network named GraspNeRF based on multi - view RGB images, which utilizes generalized neural radiance fields (NeRF) to achieve material - independent object grasping, especially in cluttered environments. ### Main Problems 1. **Grasping Detection of Transparent and Specular Objects**: - Depth cameras perform poorly on transparent and specular objects, resulting in an inability to accurately perceive the geometries of these objects. - Existing methods usually rely on depth images, and these images have missing or incorrect information on transparent and specular objects. 2. **Real - Time Performance and Generalization Ability**: - Existing NeRF - based methods require a large number of input images and long - term scene optimization, and cannot achieve real - time grasping. - These methods need to retrain NeRF when handling multi - object sequential grasping, further increasing the time and computational costs. ### Solutions The paper proposes GraspNeRF, a 6 - DoF grasping detection network based on multi - view RGB images, with the following characteristics: - **Generalized NeRF**: Utilize generalized NeRF (such as MVSNeRF and NeuRay) to aggregate features through multi - view observations without the need for individual training for each scene. - **Sparse Input**: Only a small number (6) of sparse RGB input images are required to achieve real - time grasping. - **End - to - End Learning**: Jointly learn the generalized NeRF and grasping detection, and optimize the construction of scene representations to improve grasping performance. - **Large - Scale Synthetic Dataset**: Generate a large - scale photorealistic domain - randomized synthetic dataset containing 2.4 million images and 100,000 scenes for training the model, enabling it to directly generalize to new real - world scenes. ### Experimental Results - **Simulation Experiments**: - Single - Object Grasping: GraspNeRF has an average success rate of 86.1% in 36 trials, significantly outperforming all baseline methods. - Sequential Clutter Removal: In stacking and arranging scenes, the grasping success rate and clearance rate of GraspNeRF on transparent and specular objects are significantly higher than those of other methods. - **Real - Robot Experiments**: - Single - Object Grasping: GraspNeRF has a success rate of 88.9% in 18 trials, outperforming all baseline methods. - Sequential Clutter Removal: In stacking and arranging scenes, GraspNeRF has the highest grasping success rate and clearance rate on various materials. ### Conclusion GraspNeRF achieves efficient, real - time, and high - precision grasping detection of transparent and specular objects by using generalized NeRF and sparse RGB inputs. This method performs well in both simulation and real - robot experiments, and has strong generalization ability and practical application potential.

GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF

RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields

Efficient Grasp Detection Network with Gaussian-Based Grasp Representation for Robotic Manipulation

ASGrasp: Generalizable Transparent Object Reconstruction and 6-Dof Grasp Detection from RGB-D Active Stereo Camera

MonoGraspNet: 6-DoF Grasping with a Single RGB Image

Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects

RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images

Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering

ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera

Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success

6-DoF grasp estimation method that fuses RGB-D data based on external attention

6-DoF Grasp Pose Evaluation and Optimization via Transfer Learning from NeRFs

Region-aware Grasp Framework with Normalized Grasp Space for Efficient 6-DoF Grasping

One-Shot Neural Fields for 3D Object Understanding

High Precision 6-DoF Grasp Detection in Cluttered Scenes Based on Network Optimization and Pose Propagation

NeuroGrasp: Multimodal Neural Network With Euler Region Regression for Neuromorphic Vision-Based Grasp Pose Estimation

Object Detection and Pose Estimation from RGB and Depth Data for Real-time, Adaptive Robotic Grasping

MTGrasp: Multiscale 6-Dof Robotic Grasp Detection

6-DoF Grasp Detection in Clutter with Enhanced Receptive Field and Graspable Balance Sampling

Deep Vision Networks for Real-Time Robotic Grasp Detection

Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping