Real-Time Pixel-Wise Grasp Detection Based on RGB-D Feature Dense Fusion

Yongxiang Wu,Yili Fu,Shuguo Wang
DOI: https://doi.org/10.1109/icma52036.2021.9512605
2021-01-01
Abstract:This paper presents a real-time fully convolutional network for detecting grasp pose and confidence of each pixel in RGB-D images. Instead of processing RGB-D data equally, we transform the depth image into point cloud and use a heterogeneous architecture to embed and densely fuse RGB-D information into semantically rich features. To improve the computational efficiency, we propose and integrate a novel point sampling and matching mechanism into the dense fusion. A proposed Uniform Index Sampling (UIS) algorithm is used to sample points uniformly and quickly, and corresponding color and geometry features are matched via a designed Index Image, which is also used for the consistent transformation of RGB-D data. By making full use of RGB-D information effectively, our model achieves a better accuracy of 99.1% on Cornell dataset and 96.4% on Jacquard dataset than current state-of-the-art methods. Moreover, benefiting from the efficient point sampling and matching mechanism, our methods runs at a real-time speed of 8 millisecond per frame. The proposed method is robust for physical grasping and achieves a success rate of 97% on household set, 90% on adversarial set and 91% when grasping in clutter.
What problem does this paper attempt to address?