Cross-domain Fusion and Embedded Refinement-Based 6D Object Pose Tracking on Textureless Objects
Jichun Wang,Guifang Duan,Yang Wang,Guodong Yi,Liangyu Dong,Zili Wang,Xuewei Zhang,Shuyou Zhang
DOI: https://doi.org/10.1007/s10845-023-02316-9
IF: 8.3
2024-01-01
Journal of Intelligent Manufacturing
Abstract:In industrial production, the ability to accurately perceive the location and orientation information of target objects enables the generalization of certain production processes to unstructured scenarios, thereby facilitating intelligent manufacturing. 6D object pose tracking aims to achieve real-time, accurate and long-term pose estimation given a video sequence. In this paper, we introduce a novel RGB-based 6D object pose tracking method that leverages temporal information. Our approach mainly involves building a network to predict the pose residual between two consecutive image frames. Given industrial objects with weak textures and complex shapes, we incorporate a cross-domain attention fusion module during the feature fusion phase, enabling the capture of pixel-level correspondences between different feature representations. This module enhances robustness to illumination variations and occlusion challenges. Additionally, we propose a simple yet effective pose regression module, referred to as the embedded refinement module, which considers the deviation of previous pose estimations. This module mitigates the cumulative pose estimation deviation due to large movements to some extent. We conduct comparative experiments on the YCB dataset, Fast-YCB dataset and a customized dataset specifically designed for the manipulation of industrial parts by a robotic arm. The results demonstrate that our proposed method surpasses state-of-the-art techniques, achieving robust and long-term tracking capabilities.