An efficient lightweight deep neural network for real-time object 6D pose estimation with RGB-D inputs

Yuzhou Liang,Fan Chen,Guoyuan Liang,Xinyu Wu,Wei Feng
DOI: https://doi.org/10.1109/IJCNN52387.2021.9534175
2021-07-18
Abstract:6D pose estimation for objects is an important technology in human-computer interaction. Previous works trained one or more complicated networks to predict 6D poses. Although complex models have nice performance generally, the high storage and computation cost make it difficult to be applied on hardware platforms with limited computing ability such as the low-cost mobile terminal. Hence, how to reduce the complexity of the model while maintaining accuracy remains a challenge. In this paper, we present a lightweight generic architect that processes the color and depth images respectively by employing two efficient backbone networks, then use a fusion network to realize pose regression. Furthermore, an iterative refinement network compressed is implemented by using the Filter Pruning via Geometric Median (FPGM) algorithm to refine the poses while improving real-time performance. Comprehensive experiments conducted on two benchmark datasets, LineMOD and YCB-Video, confirm that the proposed model is more than twice as fast as the state-of-the-art (SOTA) DenseFusion. For main metrics, the BFLOPs (Billion FLoat OPerations) are reduced by 97.0%, and the parameter size declines by 87.4%. The average distance (ADD) for LineMOD increases by 2.6%. The overall performance of the new model is proven outperforming SOTA methods both in efficiency and accuracy.
Computer Science
What problem does this paper attempt to address?