Jialiang Zhang,Haoran Liu,Danshi Li,Xinqiang Yu,Haoran Geng,Yufei Ding,Jiayi Chen,He Wang
Abstract:Grasping in cluttered scenes remains highly challenging for dexterous hands due to the scarcity of data. To address this problem, we present a large-scale synthetic benchmark, encompassing 1319 objects, 8270 scenes, and 427 million grasps. Beyond benchmarking, we also propose a novel two-stage grasping method that learns efficiently from data by using a diffusion model that conditions on local geometry. Our proposed generative method outperforms all baselines in simulation experiments. Furthermore, with the aid of test-time-depth restoration, our method demonstrates zero-shot sim-to-real transfer, attaining 90.7% real-world dexterous grasping success rate in cluttered scenes.
What problem does this paper attempt to address?
This paper attempts to address the challenges in dexterous grasping in cluttered scenes, especially in the case of data scarcity. Specifically, the paper mainly solves the following problems:
1. **Data Scarcity Problem**: Existing datasets are either too small, or contain loosely - placed objects, or rely on simple search methods, all of which limit the development of algorithms. To solve this problem, the authors propose a large - scale synthetic benchmark dataset, DexGraspNet 2.0, which contains 1,319 objects, 8,270 scenes and 427 million grasping labels.
2. **Grasping Distribution in Complex Scenes**: The effective grasping distribution in cluttered scenes is very complex. Methods that directly regress grasping parameters often converge to the average or median pose, resulting in penetration or inaccurate contact. For this reason, the authors propose a method based on a generative model, which can predict the grasping pose distribution according to local geometric features, thus better handling multi - modal grasping distributions.
3. **Generalization Ability**: The observational variation in cluttered scenes is much greater than that in single - object grasping tasks, which places higher requirements on the generalization ability of the model. By using a generative model conditioned on local features, the authors' method can better utilize the diverse local geometric variations in the dataset, thereby improving the generalization ability to new objects and new scenes.
### Main Contributions
1. **Large - Scale Synthetic Benchmark Dataset**: DexGraspNet 2.0 contains 1,319 objects, 8,270 scenes and 427 million grasping labels, and is one of the largest dexterous grasping datasets currently available.
2. **Two - Stage Grasping Method**: A two - stage grasping method is proposed, which uses a diffusion model to efficiently learn the grasping pose distribution based on local point features.
3. **Systematic Evaluation and Verification**: The effectiveness of the design choices is verified through systematic simulation experiments and ablation studies, and a 90.7% success rate is achieved in the real world, demonstrating the practicality of this method.
### Summary
By constructing a large - scale synthetic dataset and proposing a two - stage grasping method based on a generative model, this paper successfully solves the data scarcity and complex distribution problems faced by dexterous grasping in cluttered scenes, and significantly improves the generalization ability of the model.