Light-Weight Monocular Depth Estimation by Non-Local Decoder-Squeeze-and-Excitation Network

Hsiu-Wei Su,Tsung-Han Tsai,Yz-Heng Lin,Wei-Chung Wan
DOI: https://doi.org/10.1109/ICCE-Taiwan62264.2024.10674644
2024-07-09
Abstract:In computer vision, Monocular depth estimation is an important topic. Recently the CNNs (Convolutional Neural Networks) based model shows a reasonable result from an end-to-end encoder-decoder architecture. In our prior experiment, Non-Local Decoder-Squeeze-and-Excitation (NL-DSE) [1] was proposed. NL-DSE is based on an Efficient-Net-B5 encoder network, but the algorithmic complexity is still high. In this paper, we aim to achieve lightweight depth estimation. To accomplish this, we replace Efficient-Net-B5 with different encoder networks and compare the performance of the modules. We evaluate the accuracy of each module on the NYU Depth V2 dataset and use Nvidia AGX Xavier as our edge device to get FLOP and frame rate. Finally, we select Efficient-Net-B0 as the encoder network to achieve the lightweight monocular depth estimation.
Computer Science
What problem does this paper attempt to address?