RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

Zhiwei Lin,Zhe Liu,Zhongyu Xia,Xinhao Wang,Yongtao Wang,Shengxiang Qi,Yang Dong,Nan Dong,Le Zhang,Ce Zhu

2024-03-25

Abstract:Three-dimensional object detection is one of the key tasks in autonomous driving. To reduce costs in practice, low-cost multi-view cameras for 3D object detection are proposed to replace the expansive LiDAR sensors. However, relying solely on cameras is difficult to achieve highly accurate and robust 3D object detection. An effective solution to this issue is combining multi-view cameras with the economical millimeter-wave radar sensor to achieve more reliable multi-modal 3D object detection. In this paper, we introduce RCBEVDet, a radar-camera fusion 3D object detection method in the bird's eye view (BEV). Specifically, we first design RadarBEVNet for radar BEV feature extraction. RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section (RCS) aware BEV encoder. In the dual-stream radar backbone, a point-based encoder and a transformer-based encoder are proposed to extract radar features, with an injection and extraction module to facilitate communication between the two encoders. The RCS-aware BEV encoder takes RCS as the object size prior to scattering the point feature in BEV. Besides, we present the Cross-Attention Multi-layer Fusion module to automatically align the multi-modal BEV feature from radar and camera with the deformable attention mechanism, and then fuse the feature with channel and spatial fusion layers. Experimental results show that RCBEVDet achieves new state-of-the-art radar-camera fusion results on nuScenes and view-of-delft (VoD) 3D object detection benchmarks. Furthermore, RCBEVDet achieves better 3D detection results than all real-time camera-only and radar-camera 3D object detectors with a faster inference speed at 21~28 FPS. The source code will be released at

Computer Science

What problem does this paper attempt to address?

The paper attempts to address the problem of achieving efficient, accurate, and robust 3D object detection in autonomous driving. Specifically, the researchers focus on how to use low-cost multi-view cameras and millimeter-wave radar sensors to replace expensive LiDAR, aiming for a more cost-effective multi-modal 3D object detection. Although multi-view cameras can capture rich color and texture information, relying solely on cameras makes it difficult to achieve high-precision and robust 3D object detection, especially under adverse weather or low-light conditions. Therefore, combining multi-view cameras and millimeter-wave radar becomes a feasible and effective solution. The paper proposes a radar-camera fusion 3D object detection method named RCBEVDet, which performs feature extraction and fusion in the Bird's Eye View (BEV). The main contributions of RCBEVDet include: 1. **Proposing RCBEVDet**: An efficient radar-camera multi-modal 3D object detector that achieves high-precision and robust 3D object detection while maintaining real-time inference speed. 2. **Designing RadarBEVNet**: A network specifically for efficient radar feature extraction, including a dual-stream radar backbone and an RCS-aware BEV encoder. 3. **Introducing the Cross-Attention Multi-layer Fusion module**: Achieving dynamic alignment and fusion of radar and camera features through a deformable cross-attention mechanism. Experimental results show that RCBEVDet achieves new state-of-the-art results on the nuScenes and View-of-Delft (VoD) 3D object detection benchmarks and outperforms existing camera-based and radar-camera-based 3D object detection methods in terms of real-time performance. Additionally, RCBEVDet demonstrates good robustness in the case of sensor failures.

RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network

BEV-Radar: Bidirectional Radar-Camera Fusion for 3D Object Detection

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

HVDetFusion: A Simple and Robust Camera-Radar Fusion Framework

Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection

RCFusion: Fusing 4-D Radar and Camera with Bird's-Eye View Features for 3-D Object Detection.

RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection

UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection

Traffic Object Detection for Autonomous Driving Fusing LiDAR and Pseudo 4D-Radar under Bird’s-Eye-View

IRBEVF-Q: Optimization of Image-Radar Fusion Algorithm Based on Bird's Eye View Features

KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection

SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection

MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data