One Point, One Object: Simultaneous 3D Object Segmentation and 6-DOF Pose Estimation

Hongsen Liu

2024-06-06

Abstract:We propose a single-shot method for simultaneous 3D object segmentation and 6-DOF pose estimation in pure 3D point clouds scenes based on a consensus that \emph{one point only belongs to one object}, i.e., each point has the potential power to predict the 6-DOF pose of its corresponding object. Unlike the recently proposed methods of the similar task, which rely on 2D detectors to predict the projection of 3D corners of the 3D bounding boxes and the 6-DOF pose must be estimated by a PnP like spatial transformation method, ours is concise enough not to require additional spatial transformation between different dimensions. Due to the lack of training data for many objects, the recently proposed 2D detection methods try to generate training data by using rendering engine and achieve good results. However, rendering in 3D space along with 6-DOF is relatively difficult. Therefore, we propose an augmented reality technology to generate the training data in semi-virtual reality 3D space. The key component of our method is a multi-task CNN architecture that can simultaneously predicts the 3D object segmentation and 6-DOF pose estimation in pure 3D point clouds. For experimental evaluation, we generate expanded training data for two state-of-the-arts 3D object datasets \cite{PLCHF}\cite{TLINEMOD} by using Augmented Reality technology (AR). We evaluate our proposed method on the two datasets. The results show that our method can be well generalized into multiple scenarios and provide performance comparable to or better than the state-of-the-arts.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of object detection and 6 degrees of freedom (6-DOF) pose estimation in 3D space. Specifically, the research proposes an efficient method that can simultaneously perform 3D object detection and 6-DOF pose estimation in pure 3D point cloud scenes. This method is based on a simple consensus that each point belongs to only one object, thus each point has the potential to predict the 6-DOF pose of its corresponding object. The main contributions of the paper are as follows: 1. **Efficient Single-Pass Method**: A concise method is proposed that can directly perform point-level predictions on 3D point clouds without converting the irregular point clouds into regular 3D voxel grids or performing step-by-step processing. The core of this method is a multi-task segmentation and prediction network that can simultaneously predict: - Point-level semantic segmentation to filter background points and reduce the search space; - 3D positions of the vertices of the object's 3D bounding box for estimating the 6-DOF pose transformation; - Confidence scores to evaluate the accuracy of the 3D bounding box predictions. 2. **Augmented Reality Technology for Dataset Generation**: An effective dataset generation method based on augmented reality (AR) technology is designed, which can quickly create 3D object recognition datasets for fixed work scenes and generate extended training data for two existing 3D object recognition datasets. 3. **Experimental Validation**: The effectiveness of the proposed method is validated through extensive experiments on two public datasets (LC-HF and LineMod). The results show that the method can generalize well to various scenarios and its performance is comparable to or even surpasses existing state-of-the-art methods. In summary, this research proposes a new method that operates directly on 3D point clouds, achieving 3D object detection and 6-DOF pose estimation without complex post-processing steps. Additionally, augmented reality technology is used to generate extra training data to further improve the method's performance.

One Point, One Object: Simultaneous 3D Object Segmentation and 6-DOF Pose Estimation

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

3D Object Segmentation Using Cross-Window Point Transformer with Latent Semantic Boundary Guidance

Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation

Efficient Center Voting for Object Detection and 6D Pose Estimation in 3D Point Cloud

Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP

A Method for Unseen Object Six Degrees of Freedom Pose Estimation Based on Segment Anything Model and Hybrid Distance Optimization

Fine segmentation and difference-aware shape adjustment for category-level 6DoF object pose estimation

Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation.

Unsupervised Joint 3D Object Model Learning and 6D Pose Estimation for Depth-Based Instance Segmentation.

Real-Time and Efficient 6-D Pose Estimation from a Single RGB Image

Learning Deep Network for Detecting 3D Object Keypoints and 6D Poses

An Integrated Framework for 3-D Modeling, Object Detection, and Pose Estimation from Point-Clouds

OnePose: One-Shot Object Pose Estimation Without CAD Models

A Segmentation-Driven Approach for 6D Object Pose Estimation in the Crowd

Sequential 3D Human Pose and Shape Estimation from Point Clouds

A One Stop 3D Target Reconstruction and multilevel Segmentation Method

From Points to Multi-Object 3D Reconstruction