Dual-attention-based semantic-aware self-supervised monocular depth estimation

Jinze Xu,Feng Ye,Yizong Lai
DOI: https://doi.org/10.1007/s11042-023-17976-1
IF: 2.577
2024-01-14
Multimedia Tools and Applications
Abstract:Based on the assumption of photometric consistency, self-supervised monocular depth estimation has been widely studied due to the advantage of avoiding costly annotations. However, it is sensitive to noise, occlusion issues and photometric changes. To overcome these problems, we propose a multi-task model with a dual-attention-based cross-task feature fusion module (DCFFM). We simultaneously predict depth and semantic with a shared encoder and two separate decoders, aiming to improve depth estimation with the enhancement of semantic supervision information. In DCFFM, we fuse the cross-task features with both pixel-wise and channel-wise attention, which fully excavate and make good use of the helpful information from the other task mutually. We compute both of two attentions in a one-to-all manner to capture global information while limiting the rapid growth of computation. Furthermore, we propose a novel data augmentation method called data exchange & recovery (DE &R), which performs inter-batch data exchange in both vertical and horizontal direction so as to increase the diversity of input data. It encourages the network to explore more diversified cues for depth estimation and avoid overfitting. And essentially, the corresponding outputs are further recovered in order to keep the geometry relationship and ensure the correct calculation of photometric loss. Extensive experiments on the KITTI dataset and the NYU-Depth-v2 dataset demonstrate that our method is very effective and achieves better performance compared with other state-of-the-art works.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?