Abstract:Most supervised learning-based pose estimation methods for stacked scenes are trained on massive synthetic datasets. In most cases, the challenge is that the learned network on the training dataset is no longer optimal on the testing dataset. To address this problem, we propose a pose regression network PPR-Net++. It transforms each scene point into a point in the centroid space, followed by a clustering process and a voting process. In the training phase, a mapping function between the network's critical parameter (i.e., the bandwidth of the clustering algorithm) and the compactness of the centroid distributions is obtained. This function is used to adapt the bandwidth between centroid distributions of two different domains. In addition, to further improve the pose estimation accuracy, the network also predicts the confidence of each point, based on its visibility and pose error. Only the points with high confidence have the right to vote for the final object pose. In experiments, our method is trained on the IPA synthetic dataset and compared with the state-of-the-art algorithm. When tested with the public synthetic Siléane dataset, our method is better in all eight objects, where five of them are improved by more than 5% in average precision (AP). On IPA real dataset, our method outperforms a large margin by 20%. This lays a solid foundation for robot grasping in industrial scenarios. Note to Practitioners—Our work is motivated by industrial product assembly based on robot grasping. The industrial parts are usually manufactured by numerical machines and piled in bins. Our method can estimate the poses of visible parts accurately. A pose of a part includes its centroid and spatial orientations. Combined with a depth camera, this algorithm allows an industrial robot to understand complex stacked scenes. We improve the pose estimation-accuracy in order to assemble parts with robot grasping, without an additional pose adjuster. Our network can learn from a synthetic dataset and apply it to real-world data, without a significant accuracy drop. The synthetic dataset can be obtained easily by computer simulation programs, so the training data are sufficient. Experiments demonstrate that our method outperforms the state-of-the-art pose estimation approaches.

Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

SEMPose: A Single End-to-end Network for Multi-object Pose Estimation

Real-Time and Efficient 6-D Pose Estimation from a Single RGB Image

MFPN-6D: Real-time One-stage Pose Estimation of Objects on RGB Images

CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-Dof Object Pose Estimation

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

Semantic keypoint-based pose estimation from single RGB frames

Six-Degree-of-Freedom Pose Estimation Method for Multi-Source Feature Points Based on Fully Convolutional Neural Network

PVNet: Pixel-Wise Voting Network for 6dof Object Pose Estimation.

NVR-Net: Normal Vector Guided Regression Network for Disentangled 6D Pose Estimation

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images

Improved Stacked Hourglass Network for Robust 6D Object Pose Estimation

RNNPose: 6-DoF Object Pose Estimation Via Recurrent Correspondence Field Estimation and Pose Optimization

Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation.

Estimating 6D Pose From Localizing Designated Surface Keypoints.

Triangulate Geometric Constraint Combined with Visual-Flow Fusion Network for Accurate 6dof Pose Estimation.

PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios

Out-of-region Keypoint Localization for 6D Pose Estimation