RCFusion: Fusing 4-D Radar and Camera with Bird's-Eye View Features for 3-D Object Detection.
Lianqing Zheng,Sen Li,Bin Tan,Long Yang,Sihan Chen,Libo Huang,Jie Bai,Xichan Zhu,Zhixiong Ma
DOI: https://doi.org/10.1109/tim.2023.3280525
IF: 5.6
2023-01-01
IEEE Transactions on Instrumentation and Measurement
Abstract:Camera and millimeter-wave (MMW) radar fusion is essential for accurate and robust autonomous driving systems. With the advancement of radar technology, next-generation high-resolution automotive radar, i.e., 4-D radar, has emerged. In addition to the target range, azimuth, and Doppler velocity measurements of traditional radar, 4-D radar provides elevation measurement to create a denser “point cloud.” In this study, we propose a camera and 4-D radar fusion network called RCFusion, which achieves multimodal feature fusion under a unified bird’s-eye view (BEV) space to accomplish 3-D object detection tasks. In the camera stream, multiscale feature maps are obtained by the image backbone and feature pyramid network (FPN); they are then converted into orthographic feature maps by an orthographic feature transform (OFT). Next, enhanced and fine-grained image BEV features are obtained via a designed shared attention encoder. Meanwhile, in the 4-D radar stream, a newly designed component named radar PillarNet efficiently encodes the radar features to generate radar pseudo-images, which are fed into the point cloud backbone to create radar BEV features. An interactive attention module (IAM) is proposed for the fusion stage, which outputs a valid fusion of the two-modal BEV features. Finally, a generic detection head predicts the object classes and locations. The proposed RCFusion is validated on the TJ4DRadSet and view-of-delft (VoD) datasets. The experimental results and analysis show that the proposed method can effectively fuse camera and 4-D radar features to achieve robust detection performance.