Abstract:Depth estimation is an essential component of computer vision applications for environment perception, 3D reconstruction and scene understanding. Among the available methods, self-supervised monocular depth estimation is noteworthy for its cost-effectiveness, ease of installation and data accessibility. However, there are two challenges with current methods. Firstly, the scale factor of self-supervised monocular depth estimation is uncertain, which poses significant difficulties for practical applications. Secondly, the depth prediction accuracy for high-resolution images is still unsatisfactory, resulting in low utilization of computational resources. We propose a novel solution to address these challenges with three specific contributions. Firstly, an interleaved depth network skip-connection structure and a new depth network decoder are proposed to improve the depth prediction accuracy for high-resolution images. Secondly, a data vertical splicing module is suggested as a data enhancement method to obtain more non-vertical features and improve model generalization. Lastly, a scale recovery module is proposed to recover the accurate absolute depth without additional sensors, which solves the issue of uncertainty in the scale factor. The experimental results demonstrate that the proposed framework significantly improves the prediction accuracy of high-resolution images. In particular, the novel network structure and data vertical splicing module contribute significantly to this improvement. Moreover, in a scenario where the camera height is fixed and the ground is flat, the effect of scale recovery module is comparable to that achieved by using ground truth. Overall, the RSANet framework offers a promising solution to solve the existing challenges in self-supervised monocular depth estimation.

AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric Depth Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Monocular Depth Estimation Based on Unsupervised Learning

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Unsupervised Scale-Consistent Depth Learning from Video

A self‐supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis

RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation

ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation

Resolution-sensitive self-supervised monocular absolute depth estimation

Self-Supervised Monocular Depth Estimation With Multiscale Perception

Cascade Network for Self-Supervised Monocular Depth Estimation

Self-supervised Monocular Depth and Visual Odometry Learning with Scale-consistent Geometric Constraints

Self-Supervised Monocular Depth Estimation with Binary Mask and Lightweight Network

MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

Monocular Depth and Ego-motion Estimation with Scale Based on Superpixel and Normal Constraints

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Boosting Monocular Depth Estimation with Sparse Guided Points

Monocular Depth Estimation via Self-Supervised Self-Distillation

Towards Zero-Shot Scale-Aware Monocular Depth Estimation

Dyna-MSDepth: multi-scale self-supervised monocular depth estimation network for visual SLAM in dynamic scenes

Digging Into Self-Supervised Monocular Depth Estimation