6D object pose estimation based on dense convolutional object center voting with improved accuracy and efficiency

Faheem Ullah,Wu Wei,Zhun Fan,Qiuda Yu
DOI: https://doi.org/10.1007/s00371-023-03113-4
IF: 2.835
2023-11-25
The Visual Computer
Abstract:Abstract6D object pose estimation is an important application of computer vision and a basic module in robotic manipulation, but dealing with occlusion in a cluttered environment, handling symmetries, and textureless surfaces, are real issues. Other issues with such systems are accuracy and efficiency. The recent two-stage methods perform well in terms of accuracy; however, a linear increase in their runtimes occurs due to the increase in the number of objects in a scene. This paper proposes a fully convolutional and parallel architecture that obtains the 3D translation and orientation for object poses from the same pixel-wise dense estimation. It exploits the same voting block for inliers for multiple instances, final 3D translation estimation, and quaternions aggregation. Only the center point estimation of the objects decreases the model’s running time, while still useful for occlusions and texturelessness. Symmetries and varieties are handled with a loss function based on shape matching and the pose of the object. Our proposed approach has fewer parameters and takes less time to train and evaluate, achieving great accuracy. Experiments on LINEMOD and occlusion LINEMOD datasets using ADD (-S) and 2D projection evaluation metrics show that the proposed method outperforms state-of-the-art approaches for 6D pose estimation.
computer science, software engineering
What problem does this paper attempt to address?