Abstract:Depth estimation is crucial for scene understanding and downstream tasks, especially the self-supervised training methods showing great potential. The overall structure and local details of the scene are essential for improving the quality of depth estimation. The proposal of Monodepth2 has led to significant progress in self-supervised monocular depth estimation. However, Monodepth2 uses the most basic encoder–decoder architecture. The limited data flow information of the network leads to a large semantic gap between the encoder and the decoder, which reduces the accuracy of the network for fine-grained feature recognition. Monodepth2 adopts Resnet18 pre-trained on the Imagenet dataset as the encoder. This traditional convolutional pooling structure results in a loss of pixel information in the network at every scale. In order to solve this problem, this paper proposes an improved DepthNet. The network adopts Hrnet in semantic segmentation as the base encoder, which adopts an advanced multi-scale fusion method in the whole process, thus avoiding the loss of pixel information. An additional densely connected U-Net is employed at the decoder side to provide more information flow. Furthermore, the semantic gap between the encoder and decoder is reduced by adding different numbers of residual connections and channel attention on each layer. The network structure can be regarded as a collection of fully convolutional networks. Since the deep features of the network have a higher correlation with the vertical position, we add a spatial location attention module to the deep-level network to reduce this semantic gap. The approach performs significantly well on the KITTI dataset benchmark, with several performance criteria comparable to supervised monocular depth inference methods.

Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

LDA-Mono: A Lightweight Dual Aggregation Network for Self-Supervised Monocular Depth Estimation

MonoBooster: Semi-Dense Skip Connection with Cross-Level Attention for Boosting Self-Supervised Monocular Depth Estimation

Lightweight Self-Supervised Monocular Depth Estimation Through CNN and Transformer Integration

TinyDepth: Lightweight Self-Supervised Monocular Depth Estimation Based on Transformer

Lightweight Monocular Absolute Depth Estimation Based on Attention Mechanism

DCU-NET: Self-supervised Monocular Depth Estimation Based on Densely Connected U-shaped Convolutional Neural Networks.

DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation

CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability

MFCS-Depth: an Economical Self-Supervised Monocular Depth Estimation Based on Multi-Scale Fusion and Channel Separation Attention

Self-supervised Monocular Depth Estimation Based on Combining Convolution and Multilayer Perceptron

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

Monocular Depth Estimation With Affinity, Vertical Pooling, And Label Enhancement

LW-Net: A Lightweight Network for Monocular Depth Estimation

Novel Hybrid Neural Network for Dense Depth Estimation Using On-Board Monocular Images

FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation

SAU-Net: Monocular Depth Estimation Combining Multi-Scale Features and Attention Mechanisms

RTIA-Mono: Real-Time Lightweight Self-Supervised Monocular Depth Estimation with Global-Local Information Aggregation

Lightweight Monocular Depth Estimation with an Edge Guided Network