Abstract:6-D pose estimation is an important branch in the field of vision measurement and is widely used in the fields of robotics, autonomous driving, and reality augmentation. The latest research trend in 6-D pose estimation is to train a deep neural network to directly predict the 2-D projection position of the 3-D keypoint from the image, establish the corresponding relationship, and, finally, use the perspective-n-point (PnP) algorithm to perform pose estimation. The current challenge of pose estimation is that, when objects are textureless, occluded, or scene-cluttered, the detection accuracy is reduced, and most of the existing algorithm models are large and cannot accommodate real-time requirements. In this article, we introduce a densely connected feature pyramid network (DFPN) that can efficiently integrate and utilize features. We combine the cross-stage partial network (CSPNet) with DFPN to design a new network for 6-D pose estimation, DFPN-6-D, a new approach for 6-D object pose estimation. DFPN-6-D can efficiently and accurately handle objects with textureless, occluded, and scene clutter and estimate their full 6-D poses in a single shot. Furthermore, we propose a new confidence calculation method and loss function for object pose estimation, which can fully consider spatial information. Finally, we propose a novel augmentation method for direct 6-D pose estimation approaches to improve performance and generalization ability in the case of occlusion, which is called 6-D augmentation. Our approach achieves a new state-of-the-art accuracy of 98.06 and 87.09 in terms of the ADD(-S) metric on the Linemod dataset and the Occluded-Linemod dataset, and our method also achieves the best result in terms of the different metric on the MULT-I dataset, the BIN-P dataset, and the T-LESS dataset, respectively, while still running end-to-end at over 65 frames/s. The experimental results demonstrate that our algorithm is robust to textureless materials and occlusion while running more efficiently than other methods. We also deploy our proposed method to a real robot to grasp and manipulate objects based on the estimated pose.

Object Pose Estimation from RGB-D Images with Affordance-Instance Segmentation Constraint for Semantic Robot Manipulation

Semantic Part Segmentation Method Based 3D Object Pose Estimation with RGB-D Images for Bin-Picking

Semantic Segmentation and 6DoF Pose Estimation using RGB-D Images and Deep Neural Networks

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

Fine segmentation and difference-aware shape adjustment for category-level 6DoF object pose estimation

A Geometry-Enhanced 6D Pose Estimation Network with Incomplete Shape Recovery for Industrial Parts

Real-Time and Efficient 6-D Pose Estimation from a Single RGB Image

Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

LWOSNet: A Lightweight One-Shot Network Framework for Object Pose Estimation

Nothing But Geometric Constraints: A Model-Free Method for Articulated Object Pose Estimation

SEMPose: A Single End-to-end Network for Multi-object Pose Estimation

A Method for Unseen Object Six Degrees of Freedom Pose Estimation Based on Segment Anything Model and Hybrid Distance Optimization

RGB-D-Based Pose Estimation of Workpieces with Semantic Segmentation and Point Cloud Registration

Improving 6D Object Pose Estimation Based on Semantic Segmentation

Robot Unknown Objects Instance Segmentation Based on Collaborative Weight Assignment RGB–Depth Fusion Strategy

A RGB-D Based 6D Object Pose Estimation and Its Application in Robotic Grasping

Semantic keypoint-based pose estimation from single RGB frames

SilhoNet: An RGB Method for 6D Object Pose Estimation

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

Object affordance detection with boundary-preserving network for robotic manipulation tasks