Depth-Guided Aggregation for Real-Time Binocular Depth Estimation Network

Dongxin Fu,Shaowu Zheng,Pengcheng Xie,Weihua Li
DOI: https://doi.org/10.1109/mmul.2024.3395695
IF: 3.4911
2024-07-13
IEEE Multimedia
Abstract:Using binocular cameras to obtain depth information of target pixels offers a cost-effective and natural alternative to lidar systems. However, most of the current binocular depth estimation networks have difficulty achieving a better balance between speed and accuracy in real-world situations, and their prediction accuracy for long-range depth is often limited. In this article, we introduce the end-to-end real-time depth estimation network (RTDENet), which efficiently utilizes multiscale cost volumes for improved performance. We propose an efficient and flexible cost aggregation module that supplements residual information with high-resolution cost volumes. By replacing some computationally demanding 3-D convolutional layers with depth-guided excitation, we maintain accuracy while effectively controlling model computation. Alongside the distance-sensitive loss function, RTDENet achieves a global difference of 2.41 m and an inference time of 27 ms on the KITTI Stereo dataset. This balance of speed and accuracy outperforms other state-of-the-art algorithms in depth estimation tasks.
computer science, information systems, theory & methods, software engineering, hardware & architecture
What problem does this paper attempt to address?