RGB-Depth Structure Similarity for Self-supervised Monocular Depth Estimation

Lulu Zhang,Meng Yang
DOI: https://doi.org/10.1109/rcar58764.2023.10250088
2023-01-01
Abstract:Monocular depth estimation is a fundamental technique for robots to perceive the real (unseen) scene. Supervised methods rely on large-scale datasets with groundtruth (GT) depth labels, which cannot be well generalized to other scenes. A dominant solution is to directly train the model on target scenes in self-supervised way with pseudo depth labels (e.g. generated by stereo matching). However, pseudo depth labels are often unreliable especially near object boundaries. It may disturb the training of the model and consequently decrease the depth quality in the inference. In this paper, we investigate the structure similarity of RGB-Depth based on Gaussian kernels, because the structure of RGB image is always reliable. Such RGB-Depth structure similarity measurement is then used to improve the self-supervised depth estimation in two aspects. It is first utilized to measure the confidence of pseudo depth labels and filter unreliable pixels. It is then utilized to limit the structure of predicted depth maps in the loss. Experiments on the KITTI Eigen Splits datasets verify that the proposed method achieves better or comparable quantitative results and always achieves better visual results with clear depth boundaries compared with five recent baselines.
What problem does this paper attempt to address?