Abstract:Given monocular images as inputs, monocular depth estimation (MDE) infers pixel-level depth. MDE is always a critical stage in scene sensing on edge devices. Existing MDE studies frequently employ deep neural networks (DNNs) for MDE, but they still face some problems, such as sacrificing computational complexity and efficiency in return for great precision, or losing more precision in exchange for increased efficiency. To alleviate these issues; 1) we propose an encoder–decoder network (EdgeNet) for precise and fast MDE on different edge devices. When recovering depth in the decoder, we design upsampling modules to aggregate global depth information with low computational complexity, improving the accuracy of the decoder by extracting its different ranges of depth information; 2) we develop a two-stage channel pruning method to, respectively, prune the encoder and decoder based on their characteristics. Our pruning method further reduces latency and model/computational complexity of EdgeNet, while losing little accuracy; and 3) we optimize the pruned EdgeNet to decrease graphics processing unit (GPU) scheduling overhead. The optimization accelerates MDE inference by an order of magnitude on the TX2 GPU device, when the input resolution is 224 $\times $ 224. Extensive experiments show that our strategies are effective on different edge GPU devices, when input resolutions differ in outdoor or indoor scenes. For example, compared with the state of the art, the optimized EdgeNet, respectively, reduces the GPU latency by 76.3% and 89.2% on Nano and TX2 GPU devices with 2.6% lower root mean square error when the input resolution is 128 $\times $ 416.

Light-Weight Monocular Depth Estimation by Non-Local Decoder-Squeeze-and-Excitation Network

Binocular Depth Estimation Using Convolutional Neural Network With Siamese Branches.

Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks

Least Square Estimation Network for Depth Completion

Lightweight Monocular Depth Estimation with an Edge Guided Network

MobileXNet: An Efficient Convolutional Neural Network for Monocular Depth Estimation

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

Lightweight Monocular Depth Estimation through Guided Decoding

Lightweight Monocular Depth Estimation on Edge Devices

A Cnn-Based Depth Estimation Approach With Multi-Scale Sub-Pixel Convolutions And A Smoothness Constraint

LD-Net: A Lightweight Network for Real-Time Self-Supervised Monocular Depth Estimation

Enhanced Monocular Depth Estimation: A CNN Integrating Semantic Segmentation Embedding And Vanishing Point Detection

LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation.

CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability

LW-Net: A Lightweight Network for Monocular Depth Estimation

Super-Resolution for Monocular Depth Estimation with Multi-Scale Sub-Pixel Convolutions and a Smoothness Constraint.

EndoDepthL: Lightweight Endoscopic Monocular Depth Estimation with CNN-Transformer

Monocular Depth Estimation Based on Residual Pooling and Global-Local Feature Fusion

Lightweight monocular depth estimation using a fusion-improved transformer

Enhanced Encoder-Decoder Architecture for Accurate Monocular Depth Estimation

A lightweight network for monocular depth estimation with decoupled body and edge supervision