Abstract:Grasping in cluttered scenes remains highly challenging for dexterous hands due to the scarcity of data. To address this problem, we present a large-scale synthetic benchmark, encompassing 1319 objects, 8270 scenes, and 427 million grasps. Beyond benchmarking, we also propose a novel two-stage grasping method that learns efficiently from data by using a diffusion model that conditions on local geometry. Our proposed generative method outperforms all baselines in simulation experiments. Furthermore, with the aid of test-time-depth restoration, our method demonstrates zero-shot sim-to-real transfer, attaining 90.7% real-world dexterous grasping success rate in cluttered scenes.

What problem does this paper attempt to address?

This paper attempts to address the challenges in dexterous grasping in cluttered scenes, especially in the case of data scarcity. Specifically, the paper mainly solves the following problems: 1. **Data Scarcity Problem**: Existing datasets are either too small, or contain loosely - placed objects, or rely on simple search methods, all of which limit the development of algorithms. To solve this problem, the authors propose a large - scale synthetic benchmark dataset, DexGraspNet 2.0, which contains 1,319 objects, 8,270 scenes and 427 million grasping labels. 2. **Grasping Distribution in Complex Scenes**: The effective grasping distribution in cluttered scenes is very complex. Methods that directly regress grasping parameters often converge to the average or median pose, resulting in penetration or inaccurate contact. For this reason, the authors propose a method based on a generative model, which can predict the grasping pose distribution according to local geometric features, thus better handling multi - modal grasping distributions. 3. **Generalization Ability**: The observational variation in cluttered scenes is much greater than that in single - object grasping tasks, which places higher requirements on the generalization ability of the model. By using a generative model conditioned on local features, the authors' method can better utilize the diverse local geometric variations in the dataset, thereby improving the generalization ability to new objects and new scenes. ### Main Contributions 1. **Large - Scale Synthetic Benchmark Dataset**: DexGraspNet 2.0 contains 1,319 objects, 8,270 scenes and 427 million grasping labels, and is one of the largest dexterous grasping datasets currently available. 2. **Two - Stage Grasping Method**: A two - stage grasping method is proposed, which uses a diffusion model to efficiently learn the grasping pose distribution based on local point features. 3. **Systematic Evaluation and Verification**: The effectiveness of the design choices is verified through systematic simulation experiments and ablation studies, and a 90.7% success rate is achieved in the real world, demonstrating the practicality of this method. ### Summary By constructing a large - scale synthetic dataset and proposing a two - stage grasping method based on a generative model, this paper successfully solves the data scarcity and complex distribution problems faced by dexterous grasping in cluttered scenes, and significantly improves the generalization ability of the model.

DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes

DexRepNet: Learning Dexterous Robotic Grasping Network with Geometric and Spatial Hand-Object Representations

DA$^{2}$ Dataset: Toward Dexterity-Aware Dual-Arm Grasping.

DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation

DDGC: Generative Deep Dexterous Grasping in Clutter

GenDexGrasp: Generalizable Dexterous Grasping

Grasp as You Say: Language-guided Dexterous Grasp Generation

DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation

Towards Scale Balanced 6-DoF Grasp Detection in Cluttered Scenes

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations

On-Policy Pixel-Level Grasping Across the Gap Between Simulation and Reality

Dexterous Grasp Transformer

Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

Graspness Discovery in Clutters for Fast and Accurate Grasp Detection

Get a Grip: Multi-Finger Grasp Evaluation at Scale Enables Robust Sim-to-Real Transfer

UGG: Unified Generative Grasping

Deep Learning Method for Grasping Novel Objects Using Dexterous Hands

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

QDGset: A Large Scale Grasping Dataset Generated with Quality-Diversity