Abstract:This paper addresses the challenge of robotic grasping of general objects. Similar to prior research, the task reads a single-view 3D observation (i.e., point clouds) captured by a depth camera as input. Crucially, the success of object grasping highly demands a comprehensive understanding of the shape of objects within the scene. However, single-view observations often suffer from occlusions (including both self and inter-object occlusions), which lead to gaps in the point clouds, especially in complex cluttered scenes. This renders incomplete perception of the object shape and frequently causes failures or inaccurate pose estimation during object grasping. In this paper, we tackle this issue with an effective albeit simple solution, namely completing grasping-related scene regions through local occupancy prediction. Following prior practice, the proposed model first runs by proposing a number of most likely grasp points in the scene. Around each grasp point, a module is designed to infer any voxel in its neighborhood to be either void or occupied by some object. Importantly, the occupancy map is inferred by fusing both local and global cues. We implement a multi-group tri-plane scheme for efficiently aggregating long-distance contextual information. The model further estimates 6-DoF grasp poses utilizing the local occupancy-enhanced object shape information and returns the top-ranked grasp proposal. Comprehensive experiments on both the large-scale GraspNet-1Billion benchmark and real robotic arm demonstrate that the proposed method can effectively complete the unobserved parts in cluttered and occluded scenes. Benefiting from the occupancy-enhanced feature, our model clearly outstrips other competing methods under various performance metrics such as grasping average precision.

UPG: 3D Vision-Based Prediction Framework for Robotic Grasping in Multi-Object Scenes.

Robotic Grasping in Multi-Object Stacking Scenes Based on Visual Reasoning

A Robotic Semantic Grasping Method for Pick-and-place Tasks

SUGrasping: a Semantic Grasping Framework Based on Multi-Head 3D U-Net

A Vision-based Robot Grasping System

6D Pose Estimation with Combined Deep Learning and 3D Vision Techniques for a Fast and Accurate Object Grasping

A Robust Pixel-Wise Prediction Network with Applications to Industrial Robotic Grasping

A grasping posture estimation method based on 3D detection network

A Semantic Robotic Grasping Framework Based on Multi-Task Learning in Stacking Scenes.

Robotic Grasping Method Based on 3D Vision for Stacked Rectangular Objects

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

A Novel Robotic Grasp Detection Framework Using Low-Cost RGB-D Camera for Industrial Bin Picking

Visual Reconstruction and Localization-Based Robust Robotic 6-Dof Grasping in the Wild

Deep learning-based method for vision-guided robotic grasping of unknown objects

Antipodal-Points-aware Dual-decoding Network for Robotic Visual Grasp Detection Oriented to Multi-object Clutter Scenes

Dealing with Ambiguity in Robotic Grasping via Multiple Predictions

GoalGrasp: Grasping Goals in Partially Occluded Scenarios without Grasp Training

Design of an Unordered Gripping System Based on 3D Vision

Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection

Robotic Grasping With Multi-View Image Acquisition and Model-Based Pose Estimation

Target-Oriented Object Grasping via Multimodal Human Guidance