Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Jun Wu,Xiangyu Ru,Sha Lu,Rong Xiong,Yue Wang
DOI: https://doi.org/10.1109/tim.2024.3414940
IF: 5.6
2024-01-01
IEEE Transactions on Instrumentation and Measurement
Abstract:It is a fundamental measurement task to accurately estimate the 6D object pose for robotic manipulations. Utilizing multi-view RGB observations to estimate poses is a reasonable approach to handle scene occlusions and monocular scaling problems. Current multi-view RGB object pose estimation methods mostly follow back-end process pipelines, which are sensitive to outliers and can’t run in real-time. In this paper, we propose a recurrent multi-view RGB object pose estimation framework that leverages 3D volume-based feature fusion mechanism. By employing recurrent architectures, we achieve high-quality multi-frame pose estimation while still running on the fly. Adopting volumes to fuse features in 3D space, we implement a dense and geometrically regularized real-time fusion module for multi-view RGB features. To deal with the insufficiency of current filter-based methods when fusing non-unimodal distribution observations, we utilize the nonlinear Gated Recurrent Units to fuse multi-modal distributions. Also, we design a detection-style self-assessment module that allows accurate estimates within local key regions instead of global perception. Experiments on LineMOD, Occlusion LineMOD, and YCB Video datasets demonstrate that our proposed method effectively and efficiently improves performance for multi-view object pose estimation.
What problem does this paper attempt to address?