Self-supervised Monocular Depth Estimation with Multi-Scale Feature Fusion

Qiannan Yan,Xuezhi Xiang
DOI: https://doi.org/10.1109/icma57826.2023.10216059
2023-01-01
Abstract:Self-supervised monocular depth estimation shows great potential without using ground truth depth as supervision. Depth information is the key information for scene understanding, however, real scenes are often complex, and the scales of different targets vary greatly. To alleviate the problems caused by scale changes and small targets, we propose a depth estimation method based on multi-scale feature fusion, which integrate the encoding features and decoding features at the same level more adequately. Specifically, we design a multi-scale feature fusion (MSFF) module, which contains two branches, performing global context aggregation and local context aggregation on features respectively. By further fusing the information of these two branches, the network can simultaneously pay attention to large targets with more global distribution and small targets with more local distribution. We conducted a series of experiments on the KITTI dataset, demonstrating that our method can achieve competitive results. The visualization results show that our method can obtain high-quality depth maps although the scales of targets in the scene vary greatly.
What problem does this paper attempt to address?