RC6D: An RFID and CV Fusion System for Real-time 6D Object Pose Estimation
Bojun Zhang,Mengning Li,Xin Xie,Luoyi Fu,Xinyu Tong,Xiulong Liu
DOI: https://doi.org/10.1109/INFOCOM48880.2022.9796802
2022-01-01
Abstract:This paper studies the problem of 6D pose estimation, which is practically important in various application scenarios such as robotic-based object grasping, obstacle avoidance in autonomous driving scene, and object integration in mixed reality. However, existing methods suffer from at least one of the five major limitations: dependence on object identification, complex deployment, difficulty in data collection, low accuracy, and incomplete estimation. To overcome the above limitations, this paper proposes an RC6D system, which is the first to estimate 6D poses by fusing RFID and Computer Vision (CV) data with multi-modal deep learning techniques. In RC6D, we first detect 2D keypoints through a deep learning approach. We then propose a novel RFID-CV fusion neural network to predict the depth of the scene, and use the estimated depth information to expand the 2D keypoints to 3D keypoints. Finally, we model the coordinate correspondences between the detected 2D-3D keypoints, which is applied to estimate the 6D pose of the target object. When implementing RC6D, we mainly address the following three technical challenges. (i) To predict 6D poses without using the CAD model, we propose a network architecture for monocular depth estimation. (ii) To train the neural network for 6D pose estimation without time-consuming 6D labeling, we use an unsupervised learning algorithm based on 2D-3D point pair matching. (iii) To detect the subject of the object without identification, we leverage optical flow to restrict the object and RFID to directly obtain its information. The experimental results show that the localization error of RC6D is less than 10 cm with a probability higher than 90:64% and its orientation estimation error is less than 10 degrees with a probability higher than 79:63%. Hence, the proposed RC6D system performs much better than the state-of-the-art related solutions.