CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer

Yuhao Jin,Xiaohui Zhu,Yong Yue,Eng Gee Lim,Wei Wang
DOI: https://doi.org/10.1109/jsen.2024.3357775
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:Due to millimeter-wave (MMW) radar’s ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the the complementary strengths of camera and radar by feature-level fusion and propose a fully Transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin Transformer. We name our proposed model as CR-DINO and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art camera-radar fusion object detection models.
engineering, electrical & electronic,instruments & instrumentation,physics, applied
What problem does this paper attempt to address?