LDA-Mono: A Lightweight Dual Aggregation Network for Self-Supervised Monocular Depth Estimation

Bowen Zhao,Hongdou He,Hang Xu,Peng Shi,Xiaobing Hao,Guoyan Huang
DOI: https://doi.org/10.1016/j.knosys.2024.112552
2024-01-01
Abstract:Monocular depth estimation plays a crucial role in various computer vision and robotics applications, particularly in self-supervised methods that do not require ground-truth depth maps. The scene structure and local details are important cues for accurate depth estimation. Most recent self-supervised monocular depth estimation studies have used convolutional neural networks or networks fused with Transformers for inference. However, they primarily model global contextual relationships and lack accurate scene structure perception and proper local detail processing. To address these problems, we propose LDA-Mono, a lightweight dual-aggregation self-supervised depth estimation network. First, in response to edge blurring, we propose the Consecutive Adaptive Dilated Convolution (CADC) module. It utilises multi-layer dilated convolutions to expand the receptive field of the network and select features adaptively. This enhances the key local details and effectively fuses the multi-scale features. Our proposed Dual Feature Aggregation (DFA) module employs dual self-attention for remote global context modelling. Features are aggregated from the spatial and channel dimensions to improve the perception and representation of complex scene structures. With a well-designed architecture, the proposed model generates depth estimates with clearer details and more accurate results. The experimental results show that LDA-Mono significantly outperforms other state-of-the-art methods in terms of accuracy, with a model size of only 23% of the lightweight Lite-Mono. In addition, extensive experiments verify the effectiveness of the proposed method and demonstrate its excellent generalisability to other datasets.
What problem does this paper attempt to address?