Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

Guoyu Zuo,Zonghan Gu,Gao Huang,Daoxiong Gong
DOI: https://doi.org/10.1007/978-3-031-23609-9_4
2022-01-01
Abstract:For visual-based robotic manipulation, it has always been a challenging task to perform real-time and accurate pose estimation of target objects under cluttered background, illumination variations, occlusion, and weak texture, especially under severe occlusion conditions. In recent years, the RGB-based methods based on vector field prediction are proved to be robustness on 6D object pose estimation under occlusion. At the same time, network with attention mechanism has achieved outstanding performance in 2D object detection. In this paper, we propose an attention-driven 6D pose estimation method with multi-constraints loss and pixel-wise voting. We calculate the distance weighted unit vector length and included angle length based on prediction results to regularize unit vectors prediction. Moreover, we introduce Dense Atrous Spatial Pyramid Pooling (DenseASPP) and Channel-wise Cross Attention (CCA) mechanisms into the network structure to improve the accuracy of output prediction. Experiments on LINEMOD and Occlusion LINEMOD datasets manifest that our method outperforms state-of-the-art two-stage sparse 2D keypoints prediction methods without pose refinement.
What problem does this paper attempt to address?