Estimating 6D Object Poses with Temporal Motion Reasoning for Robot Grasping in Cluttered Scenes

Rui Huang,Fengjun Mu,Wenjiang Li,Huaping Liu,Hong Cheng
DOI: https://doi.org/10.1109/LRA.2022.3147334
2022-01-01
Abstract:6D object pose estimation is an essential task in vision-based robotic grasping and manipulation. Prior works extract object's 6D pose by regressing from single RGB-D frame without considering the occluded objects in the frame, limiting their performance in human-robot collaboration scenarios with heavy occlusion. In this paper, we present an end-to-end model named \textit{TemporalFusion}, which integrates the temporal motion information from RGB-D images for 6D object pose estimation. The core of proposed model is to embed and fuse the temporal motion information from multi-frame RGB-D sequences, which could handle heavy occlusion in human-robot collaboration tasks. Furthermore, the proposed deep model can also obtain stable pose sequences, which is essential for real-time robotic grasping tasks. We evaluated the proposed method in the YCB-Video dataset, and experimental results show our model outperforms state-of-the-art approaches. Our code is available at https://github.com/mufengjun260/H-MPose.
What problem does this paper attempt to address?