Homologous multimodal fusion network with geometric constraint keypoints selection for 6D pose estimation
Guo Yi,Fei Wang,Qichuan Ding
DOI: https://doi.org/10.1016/j.eswa.2024.126022
IF: 8.5
2024-12-15
Expert Systems with Applications
Abstract:Estimating the 6D pose of objects from RGB-D images is a fundamental problem in computer vision, with the primary challenge lying in effectively fusing these two modalities of information: color and depth. In this work, we present a novel homologous multimodal fusion framework for 6D pose estimation from RGB-D images. Unlike existing methods, our approach directly utilizes homologous RGB-D as input to exploit the innate semantic similarity between them through hierarchical global and local feature fusion. This approach avoids performance loss caused by point cloud transformation. Additionally, we introduce a rotation-invariant residual network and geometric constraint loss for calculating object keypoints, further enhancing the accuracy and robustness of localization. Extensive comparative experiments and ablation studies validate the effectiveness of the proposed method, achieving state-of-the-art performance on the LineMOD (99.9%), Occlusion-LineMOD (79.2%), and YCB-Video datasets (97.1%). Finally, we validate the effectiveness of our method through recognition and grasping experiments in cluttered real-world scenarios. Video is available at https://youtu.be/LS_m4N9b5tU .
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science