A Dual Encoder–Decoder Network for Self-Supervised Monocular Depth Estimation

Mingkui Zheng,Lin Luo,Haifeng Zheng,Zhangfan Ye,Zhe Su
DOI: https://doi.org/10.1109/jsen.2023.3296497
IF: 4.3
2023-01-01
IEEE Sensors Journal
Abstract:Depth estimation from a single image is a fundamental problem in the field of computer vision. With the great success of deep learning techniques, various self-supervised monocular depth estimation methods using encoder–decoder architectures have emerged. However, most previous approaches regress the depth map directly using a single encoder–decoder structure, which may not obtain sufficient features in the image and results in a depth map with low accuracy and blurred details. To improve the accuracy of self-supervised monocular depth estimation, we propose a simple but very effective scheme for depth estimation using a dual encoder–decoder structure network. Specifically, we introduce a novel global feature extraction network (GFN) to extract global features from images. GFN includes PoolAttentionFormer and ResBlock, which work together to extract and fuse hierarchical global features into the depth estimation network (DEN). To further improve the accuracy, we design two feature fusion mechanisms, including global feature fusion and multiscale fusion. The experimental results of various dual encoder–decoder combination schemes tested on the KITTI dataset show that our proposed one is effective in improving the accuracy of self-supervised monocular depth estimation, which reached 89.6% ( $\delta < {1.25}$ ).
What problem does this paper attempt to address?