Spatial and temporal consistency learning for monocular 6D pose estimation
Hong-Bo Zhang,Jia-Yu Liang,Jia-Xin Hong,Qing Lei,Jing-Hua Liu,Ji-Xiang Du
DOI: https://doi.org/10.1016/j.engappai.2023.107803
IF: 8
2024-01-07
Engineering Applications of Artificial Intelligence
Abstract:Monocular 6D pose estimation is a challenging task in the field of computer vision and robotics. Many previous works only input the cropped image of single object during training and inference, aiming to remove the noise from non-object regions. However, most of these methods ignore the viewpoint and spatial relationships of objects in the scene, which are crucial for accurate pose estimation of camera. To address this issue, this paper proposes a novel multi-view and multi-object based learning strategy for monocular 6D pose estimation, which involves the consistency of object coordinate for the same object at different viewpoints and the consistency of world coordinate for different objects in the same space. In the proposed method, the spatial and temporal groups are generated to trained the monocular 6D pose estimation network. Due to the camera motion, scene images taken at different times can be regarded as images captured from different viewpoints. Therefore, a temporal consistency loss is designed to constraint the relationship of the same object at different viewpoints, while a spatial consistency loss is designed to constraint the relationship of different objects at the same space. Finally, the proposed method is verified on the public datasets. Experimental results show that the proposed method is accurate, robust, and outperforms similar state-of-the-art approaches.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary