PAV-Net: Point-wise Attention Keypoints Voting Network for Real-time 6D Object Pose Estimation

Junnan Huang,Chongkun Xia,Houde Liu,Bin Liang
DOI: https://doi.org/10.1109/ijcnn55064.2022.9892089
2022-01-01
Abstract:In this paper, we propose a novel real-time 6D object pose estimation framework based on Point-wise Attention Keypoints Voting Network (PAV-Net). Compared with previous methods that use all features indiscriminately, we evaluate and integrate the visible points features before estimation to deal with the unstructured and uneven properties of point-wise features. Specifically, we first locate the object roughly by object detection and transfer the captured point cloud coordinates to the local center. Then we extract point-wise features from RGB images and point clouds respectively and perform semantic segmentation. Finally, the point-wise features are screened and integrated with the help of the attention keypoints voting to predict the accurate keypoint coordinates, and the 6D object pose can be obtained within keypoints fitting. The proposed method can effectively avoid external interference and improve the efficiency of influential point features utilization by point-wise attention voting so that the framework only needs a simple feature extraction network support to have better real-time performance. Extensive experiments confirm this conclusion and show that the performance of proposed framework on LineMOD and YCB-Video datasets is superior to other real-time pose estimation methods at the same speed.
What problem does this paper attempt to address?