Abstract:Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE). Code: <a class="link-external link-https" href="https://github.com/harborsarah/CaFNet" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to accurately perform depth estimation in autonomous driving. Specifically, this research aims to combine RGB images and radar point cloud data to generate dense depth maps. The following is a specific description of the problem: 1. **The importance of depth estimation in autonomous driving**: - Accurate 3D scene understanding is crucial for autonomous driving, and depth estimation is a key task in achieving this goal. 2. **Limitations of existing methods**: - Monocular images (RGB images) can provide rich visual information, but lack depth cues, resulting in limited performance in depth estimation. - LiDAR can provide high - quality depth maps, but is costly and sensitive to lighting and weather conditions. - Radar sensors are low - cost and adaptable, but their point cloud data is sparse and noisy, making it difficult to be directly used for depth estimation. 3. **The method proposed in this paper**: - To solve the above problems, this paper proposes a two - stage, end - to - end trainable confidence - driven fusion network (CaFNet) to combine RGB images and sparse and noisy radar point cloud data for dense depth estimation. 4. **Specific challenges and solutions**: - **Radar - specific challenges**: Radar point cloud data is sparse and noisy, height information is unclear, and the multipath effect introduces a large number of false targets. - **Solutions**: - **First stage**: Deal with radar - specific challenges by predicting the radar confidence map and the preliminary rough depth map. - **Second stage**: Introduce the confidence - aware gated fusion mechanism (CaGF) to effectively integrate radar and image features and improve the reliability of the depth map. 5. **Innovations**: - A new method for generating the ground truth of the radar confidence map is proposed, which enhances the reliability of confidence generation. - Utilize the confidence - aware gated fusion technique (CaGF) to reduce the spread of wrong data and improve the overall depth estimation performance. 6. **Experimental results**: - Evaluated on the nuScenes dataset, the results show that CaFNet improves by 3.2% and 2.7% respectively in terms of mean absolute error (MAE) and root mean square error (RMSE) compared to the current leading model. In summary, this paper aims to solve the challenges faced by depth estimation in autonomous driving by combining RGB images and radar point cloud data and using the confidence - driven fusion network (CaFNet), thereby improving the accuracy and reliability of depth estimation.

CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

RadarCam-Depth: Radar-Camera Fusion for Depth Estimation with Learned Metric Scale

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Semantic-guided Depth Completion from Monocular Images and 4D Radar Data

RaViDeep: Target Detection Based on Deep Fusion of Radar and Vision in Berthing Scenarios

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

RCDformer: Transformer-based dense depth estimation by sparse radar and camera

Radar-Camera Pixel Depth Association for Depth Completion

Depth Estimation from Monocular Images and Sparse Radar Data

GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling

Radar and Camera Fusion for Multi-Task Sensing in Autonomous Driving

Camera-Radar Fusion with Radar Channel Extension and Dual-CBAM-FPN for Object Detection

RIDERS: Radar-Infrared Depth Estimation for Robust Sensing

Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles

SRFNet: Monocular Depth Estimation with Fine-grained Structure via Spatial Reliability-oriented Fusion of Frames and Events

RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network

RCRFNet: Enhancing Object Detection with Self-Supervised Radar–Camera Fusion and Open-Set Recognition

RADIANT: Radar-Image Association Network for 3D Object Detection

CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection