CenterArt: Joint Shape Reconstruction and 6-DoF Grasp Estimation of Articulated Objects

Sassan Mokhtar,Eugenio Chisari,Nick Heppert,Abhinav Valada

2024-04-23

Abstract:Precisely grasping and reconstructing articulated objects is key to enabling general robotic manipulation. In this paper, we propose CenterArt, a novel approach for simultaneous 3D shape reconstruction and 6-DoF grasp estimation of articulated objects. CenterArt takes RGB-D images of the scene as input and first predicts the shape and joint codes through an encoder. The decoder then leverages these codes to reconstruct 3D shapes and estimate 6-DoF grasp poses of the objects. We further develop a mechanism for generating a dataset of 6-DoF grasp ground truth poses for articulated objects. CenterArt is trained on realistic scenes containing multiple articulated objects with randomized designs, textures, lighting conditions, and realistic depths. We perform extensive experiments demonstrating that CenterArt outperforms existing methods in accuracy and robustness.

Robotics,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper proposes a solution to the problem of simultaneous 3D shape reconstruction and 6 degrees of freedom (6-DoF) grasp pose estimation for complex mechanical objects, such as articulated objects. Existing methods mainly rely on reinforcement learning, requiring a large amount of data and training time, and have limited generalization ability in practical scenarios. In this paper, the researchers propose a new method called CenterArt, which is based on RGB-D images as input. The encoder predicts shape and joint encoding, and the decoder utilizes these encodings to reconstruct 3D shape and estimate 6-DoF grasp pose. The innovations of CenterArt include: 1. The first method proposed for simultaneously performing 3D shape reconstruction and 6-DoF grasp pose estimation. 2. Development of a dataset for the ground truth grasp poses of articulated objects. 3. Utilization of the Sapien simulator to create realistic kitchen scenes with multiple articulated objects. The experiments show that CenterArt outperforms existing methods in terms of accuracy and robustness, especially when dealing with real depth images and complex multi-object scenes. Moreover, compared to reinforcement learning-based methods, CenterArt has advantages in terms of training time and data requirements.

CenterArt: Joint Shape Reconstruction and 6-DoF Grasp Estimation of Articulated Objects

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation

Anthropomorphic Grasping with Neural Object Shape Completion

3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects

3D Articulated Skeleton Extraction Using a Single Consumer-Grade Depth Camera.

6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation

ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections

SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images for Articulated Objects

6D Pose Estimation with Combined Deep Learning and 3D Vision Techniques for a Fast and Accurate Object Grasping

Learning 6-DOF Grasping Interaction via Deep Geometry-aware 3D Representations

6D pose estimation of 3D objects in scenes with mutual similarities and occlusions

AO-Grasp: Articulated Object Grasp Generation

Object Detection and Pose Estimation from RGB and Depth Data for Real-time, Adaptive Robotic Grasping

RPMArt: Towards Robust Perception and Manipulation for Articulated Objects

Diet and nutrition in polycystic ovary syndrome (PCOS): Pointers for nutritional management

StrobeNet: Category-Level Multiview Reconstruction of Articulated Objects

Building Rearticulable Models for Arbitrary 3D Objects from 4D Point Clouds

You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects

Robotic Continuous Grasping System by Shape Transformer-Guided Multi-Object Category-Level 6D Pose Estimation

Single-Camera Multi-View 6DoF pose estimation for robotic grasping