RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

Zhiwei Lin,Zhe Liu,Zhongyu Xia,Xinhao Wang,Yongtao Wang,Shengxiang Qi,Yang Dong,Nan Dong,Le Zhang,Ce Zhu
DOI: https://doi.org/10.1109/cvpr52733.2024.01414
2024-01-01
Abstract:Three-dimensional object detection is one of the key tasks in autonomousdriving. To reduce costs in practice, low-cost multi-view cameras for 3D objectdetection are proposed to replace the expansive LiDAR sensors. However, relyingsolely on cameras is difficult to achieve highly accurate and robust 3D objectdetection. An effective solution to this issue is combining multi-view cameraswith the economical millimeter-wave radar sensor to achieve more reliablemulti-modal 3D object detection. In this paper, we introduce RCBEVDet, aradar-camera fusion 3D object detection method in the bird's eye view (BEV).Specifically, we first design RadarBEVNet for radar BEV feature extraction.RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section(RCS) aware BEV encoder. In the dual-stream radar backbone, a point-basedencoder and a transformer-based encoder are proposed to extract radar features,with an injection and extraction module to facilitate communication between thetwo encoders. The RCS-aware BEV encoder takes RCS as the object size prior toscattering the point feature in BEV. Besides, we present the Cross-AttentionMulti-layer Fusion module to automatically align the multi-modal BEV featurefrom radar and camera with the deformable attention mechanism, and then fusethe feature with channel and spatial fusion layers. Experimental results showthat RCBEVDet achieves new state-of-the-art radar-camera fusion results onnuScenes and view-of-delft (VoD) 3D object detection benchmarks. Furthermore,RCBEVDet achieves better 3D detection results than all real-time camera-onlyand radar-camera 3D object detectors with a faster inference speed at 21 28FPS. The source code will be released at https://github.com/VDIGPKU/RCBEVDet.
What problem does this paper attempt to address?