Mitigating imbalances in heterogeneous feature fusion for multi-class 6D pose estimation

Huafeng Wang,Haodu Zhang,Wanquan Liu,Weifeng Lv,Xianfeng Gu,Kexin Guo
DOI: https://doi.org/10.1016/j.knosys.2024.111918
IF: 8.139
2024-05-23
Knowledge-Based Systems
Abstract:Most 6D pose studies often treat RGB and Depth features equally in fusion, potentially limiting model generalization, especially in multi-class tasks. This limitation arises from prevalent static map generation strategies that overlook discriminative features in downsampling sparse point clouds. Additionally, the commonly adopted direct concatenation approach in heterogeneous feature fusion often leads to an averaging effect, thereby reducing the effectiveness of each feature. To tackle these challenges, we propose an effective model for dynamic graph structure feature extraction, aimed at capturing richer features from point clouds. And we introduce an adaptive fusion method for heterogeneous features, which takes into account the unequal contributions to 6D pose estimation. Validation on benchmark datasets LineMOD and YCB-Video demonstrates its effectiveness for multi-class 6D pose estimation, surpassing existing fusion methods. Of particular significance, our method attains state-of-the-art (SOTA) results on the YCB-Video dataset. The code for this study can be accessed at https://github.com/ZEROhands/6D_Pose_Estimate .
computer science, artificial intelligence
What problem does this paper attempt to address?