A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving

Moyun Liu,Bing Chen,Youping Chen,Jingming Xie,Lei Yao,Yang Zhang,Joey Tianyi Zhou

2024-04-22

Abstract:Depth completion is a crucial task in autonomous driving, aiming to convert a sparse depth map into a dense depth prediction. Due to its potentially rich semantic information, RGB image is commonly fused to enhance the completion effect. Image-guided depth completion involves three key challenges: 1) how to effectively fuse the two modalities; 2) how to better recover depth information; and 3) how to achieve real-time prediction for practical autonomous driving. To solve the above problems, we propose a concise but effective network, named CENet, to achieve high-performance depth completion with a simple and elegant structure. Firstly, we use a fast guidance module to fuse the two sensor features, utilizing abundant auxiliary features extracted from the color space. Unlike other commonly used complicated guidance modules, our approach is intuitive and low-cost. In addition, we find and analyze the optimization inconsistency problem for observed and unobserved positions, and a decoupled depth prediction head is proposed to alleviate the issue. The proposed decoupled head can better output the depth of valid and invalid positions with very few extra inference time. Based on the simple structure of dual-encoder and single-decoder, our CENet can achieve superior balance between accuracy and efficiency. In the KITTI depth completion benchmark, our CENet attains competitive performance and inference speed compared with the state-of-the-art methods. To validate the generalization of our method, we also evaluate on indoor NYUv2 dataset, and our CENet still achieve impressive results. The code of this work will be available at

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to address the problem of image-guided depth completion in autonomous driving. Specifically, the paper aims to convert sparse depth maps into dense depth predictions to enhance the perception capabilities of autonomous driving systems. Image-guided depth completion faces three main challenges: 1. **Effective fusion of two modalities**: How to effectively fuse RGB images and sparse depth maps to enhance the recovery of depth information. 2. **Enhancing depth information recovery**: How to better recover depth information during the fusion process. 3. **Achieving real-time prediction capability**: How to achieve real-time prediction while maintaining high accuracy to meet the needs of actual autonomous driving scenarios. To address these issues, the authors propose a simple yet high-performance network—CHNet. The main features of CHNet include: - **Fast guidance module**: By utilizing the rich auxiliary information in the color space, it quickly and effectively fuses the features of the two sensors. Compared to existing complex guidance modules, CHNet adopts an intuitive and cost-effective strategy. - **Decoupled depth prediction head**: It identifies and analyzes the optimization inconsistency between observed and unobserved positions and introduces a decoupled depth prediction head to better distinguish and predict the depth values of valid and invalid positions while minimizing additional inference time. - **Dual encoder single decoder structure**: By simplifying the network structure, CHNet achieves the optimal balance between accuracy and computational efficiency. In benchmark tests, CHNet demonstrated competitive performance metrics and inference speed on the KITTI depth completion dataset, while also showing impressive performance on the indoor NYUv2 dataset. These results prove the effectiveness and versatility of CHNet.

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving

A concise but high-performing network for image guided depth completion in autonomous driving

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Least Square Estimation Network for Depth Completion

Learning Guided Convolutional Network for Depth Completion

PENet: Towards Precise and Efficient Image Guided Depth Completion

Gated Recurrent Fusion UNet for Depth Completion

An Efficient Information-Reinforced Lidar Deep Completion Network without RGB Guided

Gated Cross-Attention Network for Depth Completion

CU-Net: LiDAR Depth-Only Completion With Coupled U-Net

DenseLiDAR: A Real-Time Pseudo Dense Depth Guided Depth Completion Network

Agspn: Efficient Attention-Gated Spatial Propagation Network for Depth Completion

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

RigNet++: Efficient Repetitive Image Guided Network for Depth Completion

UAMD-Net: A Unified Adaptive Multimodal Neural Network for Dense Depth Completion

Multi-stage Multi-scale Color Guided Depth Image Completion for Road Scenes

Depth Completion from Sparse LiDAR Data with Depth-Normal Constraints

SGSNet: A Lightweight Depth Completion Network Based on Secondary Guidance and Spatial Fusion

Adaptive Context-Aware Multi-Modal Network for Depth Completion