Abstract:As the basis for prehensile manipulation, it is vital to enable robots to grasp as robustly as humans. Our innate grasping system is prompt, accurate, flexible, and continuous across spatial and temporal domains. Few existing methods cover all these properties for robot grasping. In this paper, we propose AnyGrasp for grasp perception to enable robots these abilities using a parallel gripper. Specifically, we develop a dense supervision strategy with real perception and analytic labels in the spatial-temporal domain. Additional awareness of objects' center-of-mass is incorporated into the learning process to help improve grasping stability. Utilization of grasp correspondence across observations enables dynamic grasp tracking. Our model can efficiently generate accurate, 7-DoF, dense, and temporally-smooth grasp poses and works robustly against large depth-sensing noise. Using AnyGrasp, we achieve a 93.3% success rate when clearing bins with over 300 unseen objects, which is on par with human subjects under controlled conditions. Over 900 mean-picks-per-hour is reported on a single-arm system. For dynamic grasping, we demonstrate catching swimming robot fish in the water. Our project page is at <a class="link-external link-https" href="https://graspnet.net/anygrasp.html" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of robust and efficient grasping perception for robots in the spatial and temporal domains. Specifically, the paper proposes a method named AnyGrasp to enable robots to grasp objects as quickly, accurately and flexibly as humans. The main problems to be solved include: 1. **Grasping perception in static scenes**: - When existing methods perform grasping perception in static scenes, they usually have the following problems: - Assuming a complete 3D model or contact model of the object, which is difficult to achieve in the real world. - Simplifying grasping perception into a plane detection problem, which limits the flexibility of subsequent operations. - Adopting the sampling - evaluation method. Although more grasping postures are generated, the calculation time is long and dense predictions cannot be generated. - AnyGrasp estimates dense 7 - degrees - of - freedom (7 - DoF) grasping configurations directly from monocular perception input through an end - to - end network, improving the accuracy and efficiency of grasping. 2. **Grasping perception in dynamic scenes**: - Grasping perception in dynamic scenes is a relatively unexplored area. Existing methods usually require prior information or a fixed set of grasping trajectories, which is impractical in practical applications. - AnyGrasp introduces a new generation - association method, which can generate continuous 7 - DoF grasping configurations in dynamic scenes and achieve dynamic grasping tracking through a time - association module. 3. **The importance of training with real data**: - The paper emphasizes the importance of training with real data. Although simulated data can reduce training costs, a high - precision depth camera is required in the inference stage to bridge the gap between simulation and reality. - By training with real data, AnyGrasp can adapt to the noise in the real world and performs especially well on low - cost cameras. ### Main contributions 1. **Unified system**: - Proposed the first unified system for fast, accurate, 7 - DoF and temporally continuous grasping posture detection, using parallel grippers. 2. **Center of mass perception**: - Introduced the perception of the center of mass (COG) of the object to improve grasping stability. 3. **Robustness verification**: - Verified the robustness of the method through a large number of experiments. Using only a dataset of 144 real objects, it can achieve performance comparable to that of humans in a variety of challenging tasks. 4. **High - performance library**: - Released a grasping library whose performance is comparable to that of humans on more than 300 unseen objects, with an average of more than 900 grasps per hour. 5. **Dataset analysis**: - Provided a detailed analysis of different training factors, such as choosing real data or simulated data, the influence of annotation density, and the importance of scene diversity. ### Conclusion By proposing the AnyGrasp system, the paper solves the key problems of robust and efficient grasping perception for robots in static and dynamic scenes. This system not only performs well in static scenes but also achieves continuous grasping tracking in dynamic scenes. In addition, by training with real data, the robustness and adaptability of the system are further improved. These contributions provide important references and inspiration for future research.

AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains

LiteGrasp: A Light Robotic Grasp Detection Via Semi-Supervised Knowledge Distillation

Active Learning-Based Grasp for Accurate Industrial Manipulation.

Graspness Discovery in Clutters for Fast and Accurate Grasp Detection

Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping

EfficientGrasp: A Unified Data-Efficient Learning to Grasp Method for Multi-Fingered Robot Hands

A Robotic Semantic Grasping Method for Pick-and-place Tasks

A Vision-based Robot Grasping System

AO-Grasp: Articulated Object Grasp Generation

GoalGrasp: Grasping Goals in Partially Occluded Scenarios without Grasp Training

Learning 6-DoF Task-oriented Grasp Detection via Implicit Estimation and Visual Affordance

6-DoF Grasp Detection in Clutter with Enhanced Receptive Field and Graspable Balance Sampling

High Precision 6-DoF Grasp Detection in Cluttered Scenes Based on Network Optimization and Pose Propagation

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

ASGrasp: Generalizable Transparent Object Reconstruction and 6-Dof Grasp Detection from RGB-D Active Stereo Camera

Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

Active Affordance Exploration For Robot Grasping

PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models