Abstract:6-D pose estimation is an important branch in the field of vision measurement and is widely used in the fields of robotics, autonomous driving, and reality augmentation. The latest research trend in 6-D pose estimation is to train a deep neural network to directly predict the 2-D projection position of the 3-D keypoint from the image, establish the corresponding relationship, and, finally, use the perspective-n-point (PnP) algorithm to perform pose estimation. The current challenge of pose estimation is that, when objects are textureless, occluded, or scene-cluttered, the detection accuracy is reduced, and most of the existing algorithm models are large and cannot accommodate real-time requirements. In this article, we introduce a densely connected feature pyramid network (DFPN) that can efficiently integrate and utilize features. We combine the cross-stage partial network (CSPNet) with DFPN to design a new network for 6-D pose estimation, DFPN-6-D, a new approach for 6-D object pose estimation. DFPN-6-D can efficiently and accurately handle objects with textureless, occluded, and scene clutter and estimate their full 6-D poses in a single shot. Furthermore, we propose a new confidence calculation method and loss function for object pose estimation, which can fully consider spatial information. Finally, we propose a novel augmentation method for direct 6-D pose estimation approaches to improve performance and generalization ability in the case of occlusion, which is called 6-D augmentation. Our approach achieves a new state-of-the-art accuracy of 98.06 and 87.09 in terms of the ADD(-S) metric on the Linemod dataset and the Occluded-Linemod dataset, and our method also achieves the best result in terms of the different metric on the MULT-I dataset, the BIN-P dataset, and the T-LESS dataset, respectively, while still running end-to-end at over 65 frames/s. The experimental results demonstrate that our algorithm is robust to textureless materials and occlusion while running more efficiently than other methods. We also deploy our proposed method to a real robot to grasp and manipulate objects based on the estimated pose.

Progress and limitations of deep networks to recognize objects in unusual poses

A comparison between humans and AI at recognizing objects in unusual poses

Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

Recognizing Objects In-the-wild: Where Do We Stand?

Computer Vision : History , the Rise of Deep Networks , and Future Vistas Panel on Perception and Cognition , MORS Meeting on Artificial Intelligence and Autonomy

Humans and deep networks largely agree on which kinds of variation make object recognition harder

RFF-PoseNet: A 6D Object Pose Estimation Network Based on Robust Feature Fusion in Complex Scenes

Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks?

Real-Time and Efficient 6-D Pose Estimation from a Single RGB Image

Grasping Pose Detection for Loose Stacked Object Based on Convolutional Neural Network with Multiple Self-Powered Sensors Information

Learning Deep Network for Detecting 3D Object Keypoints and 6D Poses

Partial success in closing the gap between human and machine vision

LPNet: Retina Inspired Neural Network for Object Detection and Recognition

Extreme Image Transformations Facilitate Robust Latent Object Representations

Multimodal deep learning for robust RGB-D object recognition

Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

Analysing the Effects of Pooling Combinations on Invariance to Position and Deformation in Convolutional Neural Networks

3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

3D_DEN: Open-ended 3D Object Recognition using Dynamically Expandable Networks