Abstract:Most supervised learning-based pose estimation methods for stacked scenes are trained on massive synthetic datasets. In most cases, the challenge is that the learned network on the training dataset is no longer optimal on the testing dataset. To address this problem, we propose a pose regression network PPR-Net++. It transforms each scene point into a point in the centroid space, followed by a clustering process and a voting process. In the training phase, a mapping function between the network's critical parameter (i.e., the bandwidth of the clustering algorithm) and the compactness of the centroid distributions is obtained. This function is used to adapt the bandwidth between centroid distributions of two different domains. In addition, to further improve the pose estimation accuracy, the network also predicts the confidence of each point, based on its visibility and pose error. Only the points with high confidence have the right to vote for the final object pose. In experiments, our method is trained on the IPA synthetic dataset and compared with the state-of-the-art algorithm. When tested with the public synthetic Siléane dataset, our method is better in all eight objects, where five of them are improved by more than 5% in average precision (AP). On IPA real dataset, our method outperforms a large margin by 20%. This lays a solid foundation for robot grasping in industrial scenarios. Note to Practitioners—Our work is motivated by industrial product assembly based on robot grasping. The industrial parts are usually manufactured by numerical machines and piled in bins. Our method can estimate the poses of visible parts accurately. A pose of a part includes its centroid and spatial orientations. Combined with a depth camera, this algorithm allows an industrial robot to understand complex stacked scenes. We improve the pose estimation-accuracy in order to assemble parts with robot grasping, without an additional pose adjuster. Our network can learn from a synthetic dataset and apply it to real-world data, without a significant accuracy drop. The synthetic dataset can be obtained easily by computer simulation programs, so the training data are sufficient. Experiments demonstrate that our method outperforms the state-of-the-art pose estimation approaches.

LWOSNet: A Lightweight One-Shot Network Framework for Object Pose Estimation

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP

Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image

A Robust CoS-PVNet Pose Estimation Network in Complex Scenarios

A Lightweight Method of Pose Estimation for Indoor Object

RNNPose: 6-DoF Object Pose Estimation Via Recurrent Correspondence Field Estimation and Pose Optimization

Deep instance segmentation and 6D object pose estimation in cluttered scenes for robotic autonomous grasping

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

PAV-Net: Point-wise Attention Keypoints Voting Network for Real-time 6D Object Pose Estimation

OnePose: One-Shot Object Pose Estimation Without CAD Models

Semantic Segmentation and 6DoF Pose Estimation using RGB-D Images and Deep Neural Networks

LSDNet: lightweight stochastic depth network for human pose estimation

A RGB-D Based 6D Object Pose Estimation and Its Application in Robotic Grasping

PVNet: Pixel-Wise Voting Network for 6dof Object Pose Estimation.

PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios

End-to-End 6dof Pose Estimation from Monocular RGB Images

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

SEMPose: A Single End-to-end Network for Multi-object Pose Estimation

Sparse Convolution Based 6D Pose Estimation for Robotic Bin-Picking with Point Clouds

CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-Dof Object Pose Estimation