Robotic Grasp Detection Based on Category-Level Object Pose Estimation with Self-Supervised Learning

Sheng Yu,Di-Hua Zhai,Yuanqing Xia
DOI: https://doi.org/10.1109/tmech.2023.3287635
2024-01-01
Abstract:6-D object pose estimation is widely used in the robotic grasp, and a series of object pose estimation methods have been proposed. Among them, category-level object pose estimation methods are widely researched in recent years. Category-level object pose estimation is mainly used to estimate the pose of unknown objects in the same class, and has been used in robotic grasping and augmented reality. Most of the current methods tend to rely on large datasets as well as labels, which pose challenges. To address this problem, a new category-level object pose estimation network, SCNet, is proposed, which not only enables the network to transfer from the simulation environment to the real world but also allows us to train the network with a self-supervised learning way, which can well compensate for the lack of large-scale labeled datasets. Since the network lacks 3-D models of unknown objects, we introduce the prior point cloud of objects in the same category and propose a deformation module based on RGB images and the prior point cloud, which enables the prior point cloud to be well deformed into target objects in the scene. Moreover, a transformer-based recurrent refinement module is proposed to further refine the deformation structure to better fit target objects. We have performed evaluation experiments on CAMERA25 dataset and REAL275 dataset, and our experimental results show that the proposed method outperforms current self-supervised category-based pose estimation methods, and outperforms some supervised category pose estimation methods. Finally, we apply the SCNet to the object pose estimation in the real world, and perform a series of robotic grasp tasks on a Baxter robot.
What problem does this paper attempt to address?