Interactive Guidance Network for Object Detection Based on Radar-Camera Fusion
Jiapeng Wang,Linhua Kong,Dongxia Chang,Zisen Kong,Yao Zhao
DOI: https://doi.org/10.1007/s11042-023-16574-5
IF: 2.577
2024-01-01
Multimedia Tools and Applications
Abstract:In recent years, the performance of image-based object detection algorithms has improved significantly, especially in the field of autonomous driving. It is well known that camera sensors are susceptible to adverse weather conditions, which can significantly affect their performance. In contrast, millimeter wave radar is robust to such weather conditions. As a result, the fusion of millimeter-wave radar and camera sensor has gained considerable attention as a promising approach for object detection. However, existing methods hardly take into account the correlation between the two modalities, leading to detection results that are vulnerable to radar noise, visual blur, and other confounding factors. To address this challenge, we propose an interactive guidance network that leverages a cross-modal attention mechanism, enabling radar and camera sensors to mutually guide each other and learn the underlying correlation between the two modalities. Our approach aims to achieve complementary fusion of features while effectively utilizing information from both radar and camera sensors to enhance detection results. Moreover, a bi-directional fusion Feature Pyramid Network (FPN) structure is introduced, which generates feature maps with enhanced semantic and texture information. To assess the effectiveness of our proposed method, we conducted experiments on the NuScenes dataset. The results demonstrate that our approach outperforms existing state-of-the-art methods in terms of object detection accuracy.