Monocular Depth Estimation Based on Residual Pooling and Global-Local Feature Fusion

Linke Li,Zhengyou Liang,Xinyu Liang,Shun Li
DOI: https://doi.org/10.1109/access.2024.3453942
IF: 3.9
2024-09-10
IEEE Access
Abstract:To improve the prediction accuracy of monocular depth estimation networks and address issues such as edge blurring and excessive artifacts in the generated depth maps, this paper proposes a deep network architecture based on a global-local feature fusion module and a residual pooling module. The encoder utilizes a Hierarchical Transformer, while the decoder incorporates a U-Net structure model that combines multi-dimensional attention features aggregation and residual pooling. The residual pooling module facilitates better extraction of background contextual information from the feature maps to obtain more precise scene depth information. The global-local feature fusion module enables the network to learn features that encompass both global and local information. Experimental evaluations conducted on the NYU Depth V2 and KITTI datasets demonstrate that the proposed method achieves a of 0.916 on the NYU Depth V2 dataset, along with enhanced generalization ability and robustness. Furthermore, the effectiveness of each module is validated through ablation studies on the NYU Depth V2 dataset.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?