Hao-Shu Fang,Chenxi Wang,Hongjie Fang,Minghao Gou,Jirong Liu,Hengxu Yan,Wenhai Liu,Yichen Xie,Cewu Lu
Abstract:As the basis for prehensile manipulation, it is vital to enable robots to grasp as robustly as humans. Our innate grasping system is prompt, accurate, flexible, and continuous across spatial and temporal domains. Few existing methods cover all these properties for robot grasping. In this paper, we propose AnyGrasp for grasp perception to enable robots these abilities using a parallel gripper. Specifically, we develop a dense supervision strategy with real perception and analytic labels in the spatial-temporal domain. Additional awareness of objects' center-of-mass is incorporated into the learning process to help improve grasping stability. Utilization of grasp correspondence across observations enables dynamic grasp tracking. Our model can efficiently generate accurate, 7-DoF, dense, and temporally-smooth grasp poses and works robustly against large depth-sensing noise. Using AnyGrasp, we achieve a 93.3% success rate when clearing bins with over 300 unseen objects, which is on par with human subjects under controlled conditions. Over 900 mean-picks-per-hour is reported on a single-arm system. For dynamic grasping, we demonstrate catching swimming robot fish in the water. Our project page is at <a class="link-external link-https" href="https://graspnet.net/anygrasp.html" rel="external noopener nofollow">this https URL</a>
What problem does this paper attempt to address?
### Problems the paper attempts to solve
This paper aims to solve the problem of robust and efficient grasping perception for robots in the spatial and temporal domains. Specifically, the paper proposes a method named AnyGrasp to enable robots to grasp objects as quickly, accurately and flexibly as humans. The main problems to be solved include:
1. **Grasping perception in static scenes**:
- When existing methods perform grasping perception in static scenes, they usually have the following problems:
- Assuming a complete 3D model or contact model of the object, which is difficult to achieve in the real world.
- Simplifying grasping perception into a plane detection problem, which limits the flexibility of subsequent operations.
- Adopting the sampling - evaluation method. Although more grasping postures are generated, the calculation time is long and dense predictions cannot be generated.
- AnyGrasp estimates dense 7 - degrees - of - freedom (7 - DoF) grasping configurations directly from monocular perception input through an end - to - end network, improving the accuracy and efficiency of grasping.
2. **Grasping perception in dynamic scenes**:
- Grasping perception in dynamic scenes is a relatively unexplored area. Existing methods usually require prior information or a fixed set of grasping trajectories, which is impractical in practical applications.
- AnyGrasp introduces a new generation - association method, which can generate continuous 7 - DoF grasping configurations in dynamic scenes and achieve dynamic grasping tracking through a time - association module.
3. **The importance of training with real data**:
- The paper emphasizes the importance of training with real data. Although simulated data can reduce training costs, a high - precision depth camera is required in the inference stage to bridge the gap between simulation and reality.
- By training with real data, AnyGrasp can adapt to the noise in the real world and performs especially well on low - cost cameras.
### Main contributions
1. **Unified system**:
- Proposed the first unified system for fast, accurate, 7 - DoF and temporally continuous grasping posture detection, using parallel grippers.
2. **Center of mass perception**:
- Introduced the perception of the center of mass (COG) of the object to improve grasping stability.
3. **Robustness verification**:
- Verified the robustness of the method through a large number of experiments. Using only a dataset of 144 real objects, it can achieve performance comparable to that of humans in a variety of challenging tasks.
4. **High - performance library**:
- Released a grasping library whose performance is comparable to that of humans on more than 300 unseen objects, with an average of more than 900 grasps per hour.
5. **Dataset analysis**:
- Provided a detailed analysis of different training factors, such as choosing real data or simulated data, the influence of annotation density, and the importance of scene diversity.
### Conclusion
By proposing the AnyGrasp system, the paper solves the key problems of robust and efficient grasping perception for robots in static and dynamic scenes. This system not only performs well in static scenes but also achieves continuous grasping tracking in dynamic scenes. In addition, by training with real data, the robustness and adaptability of the system are further improved. These contributions provide important references and inspiration for future research.