Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

Jun Wu,Xiangyu Ru,Sha Lu,Rong Xiong,Yue Wang
DOI: https://doi.org/10.1109/tim.2024.3414940
IF: 5.6
2024-01-01
IEEE Transactions on Instrumentation and Measurement
Abstract:It is a fundamental measurement task to accurately estimate the 6-D object pose for robotic manipulations. Utilizing multiview RGB observations to estimate poses is a reasonable approach to handle scene occlusions and monocular scaling problems. Current multiview RGB object pose estimation methods mostly follow back-end process pipelines, which are sensitive to outliers and cannot run in real-time. In this article, we propose a recurrent multiview RGB object pose estimation framework that leverages 3-D volume-based feature fusion mechanism. By employing recurrent architectures, we achieve high-quality multiframe pose estimation while still running on-the-fly. Adopting volumes to fuse features in 3-D space, we implement a dense and geometrically regularized real-time fusion module for multiview RGB features. To deal with the insufficiency of current filter-based methods when fusing non-unimodal distribution observations, we utilize the nonlinear gated recurrent units (GRUs) to fuse multimodal distributions. Also, we design a detection-style self-assessment module that allows accurate estimates within local key regions instead of global perception. Experiments on LineMOD, Occlusion LineMOD, and YCB Video datasets demonstrate that our proposed method effectively and efficiently improves performance for multiview object pose estimation.
What problem does this paper attempt to address?