UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection

Haocheng Zhao,Runwei Guan,Taoyu Wu,Ka Lok Man,Limin Yu,Yutao Yue

2024-09-23

Abstract:4D millimeter-wave (MMW) radar, which provides both height information and dense point cloud data over 3D MMW radar, has become increasingly popular in 3D object detection. In recent years, radar-vision fusion models have demonstrated performance close to that of LiDAR-based models, offering advantages in terms of lower hardware costs and better resilience in extreme conditions. However, many radar-vision fusion models treat radar as a sparse LiDAR, underutilizing radar-specific information. Additionally, these multi-modal networks are often sensitive to the failure of a single modality, particularly vision. To address these challenges, we propose the Radar Depth Lift-Splat-Shoot (RDL) module, which integrates radar-specific data into the depth prediction process, enhancing the quality of visual Bird-Eye View (BEV) features. We further introduce a Unified Feature Fusion (UFF) approach that extracts BEV features across different modalities using shared module. To assess the robustness of multi-modal models, we develop a novel Failure Test (FT) ablation experiment, which simulates vision modality failure by injecting Gaussian noise. We conduct extensive experiments on the View-of-Delft (VoD) and TJ4D datasets. The results demonstrate that our proposed Unified BEVFusion (UniBEVFusion) network significantly outperforms state-of-the-art models on the TJ4D dataset, with improvements of 1.44 in 3D and 1.72 in BEV object detection accuracy.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address several key issues in 3D object detection with multimodal radar and vision fusion: 1. **Fully Utilizing Radar Characteristics**: Existing radar-vision fusion models typically treat radar as sparse LiDAR, failing to fully leverage radar-specific information. The paper proposes a new Radar Depth Lift-Splat-Shoot (RDL) module, which integrates radar-specific data (such as Radar Cross Section, RCS) into the depth prediction process to improve the quality of visual Bird's Eye View (BEV) features. 2. **Enhancing Model Robustness**: Multimodal networks perform poorly when a single modality fails, especially in the case of vision modality failure. To address this, the paper introduces the Unified Feature Fusion (UFF) method, which extracts BEV features from different modalities through shared modules and unifies them, thereby enhancing the model's robustness in the event of single modality failure. 3. **Evaluating Model Robustness**: To assess the robustness of multimodal models, the paper develops a new Failure Test (FT) experiment, which simulates vision failure by injecting Gaussian noise into the visual input. Experimental results show that the proposed UniBEVFusion network improves 3D and BEV object detection accuracy by 1.44% and 1.72% respectively on the TJ4D dataset, significantly outperforming existing models. In summary, the paper primarily addresses how to better utilize radar information and enhance the robustness of multimodal models.

UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection

Millimeter-Wave Radar and Vision Fusion Target Detection Algorithm Based on an Extended Network

Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection

Fusing LiDAR and Radar with Pillars Attention for 3D Object Detection

RaViDeep: Target Detection Based on Deep Fusion of Radar and Vision in Berthing Scenarios

RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network

BEV-Radar: Bidirectional Radar-Camera Fusion for 3D Object Detection

RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Radar and Camera Fusion for Multi-Task Sensing in Autonomous Driving

HVDetFusion: A Simple and Robust Camera-Radar Fusion Framework

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving

MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving

RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection

UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities

InterFusion: Interaction-based 4D Radar and LiDAR Fusion for 3D Object Detection

MUFASA: Multi-View Fusion and Adaptation Network with Spatial Awareness for Radar Object Detection