A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving

Moyun Liu,Bing Chen,Youping Chen,Jingming Xie,Lei Yao,Yang Zhang,Joey Tianyi Zhou
2024-04-22
Abstract:Depth completion is a crucial task in autonomous driving, aiming to convert a sparse depth map into a dense depth prediction. Due to its potentially rich semantic information, RGB image is commonly fused to enhance the completion effect. Image-guided depth completion involves three key challenges: 1) how to effectively fuse the two modalities; 2) how to better recover depth information; and 3) how to achieve real-time prediction for practical autonomous driving. To solve the above problems, we propose a concise but effective network, named CENet, to achieve high-performance depth completion with a simple and elegant structure. Firstly, we use a fast guidance module to fuse the two sensor features, utilizing abundant auxiliary features extracted from the color space. Unlike other commonly used complicated guidance modules, our approach is intuitive and low-cost. In addition, we find and analyze the optimization inconsistency problem for observed and unobserved positions, and a decoupled depth prediction head is proposed to alleviate the issue. The proposed decoupled head can better output the depth of valid and invalid positions with very few extra inference time. Based on the simple structure of dual-encoder and single-decoder, our CENet can achieve superior balance between accuracy and efficiency. In the KITTI depth completion benchmark, our CENet attains competitive performance and inference speed compared with the state-of-the-art methods. To validate the generalization of our method, we also evaluate on indoor NYUv2 dataset, and our CENet still achieve impressive results. The code of this work will be available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the problem of image-guided depth completion in autonomous driving. Specifically, the paper aims to convert sparse depth maps into dense depth predictions to enhance the perception capabilities of autonomous driving systems. Image-guided depth completion faces three main challenges: 1. **Effective fusion of two modalities**: How to effectively fuse RGB images and sparse depth maps to enhance the recovery of depth information. 2. **Enhancing depth information recovery**: How to better recover depth information during the fusion process. 3. **Achieving real-time prediction capability**: How to achieve real-time prediction while maintaining high accuracy to meet the needs of actual autonomous driving scenarios. To address these issues, the authors propose a simple yet high-performance network—CHNet. The main features of CHNet include: - **Fast guidance module**: By utilizing the rich auxiliary information in the color space, it quickly and effectively fuses the features of the two sensors. Compared to existing complex guidance modules, CHNet adopts an intuitive and cost-effective strategy. - **Decoupled depth prediction head**: It identifies and analyzes the optimization inconsistency between observed and unobserved positions and introduces a decoupled depth prediction head to better distinguish and predict the depth values of valid and invalid positions while minimizing additional inference time. - **Dual encoder single decoder structure**: By simplifying the network structure, CHNet achieves the optimal balance between accuracy and computational efficiency. In benchmark tests, CHNet demonstrated competitive performance metrics and inference speed on the KITTI depth completion dataset, while also showing impressive performance on the indoor NYUv2 dataset. These results prove the effectiveness and versatility of CHNet.