Self-supervised monocular depth estimation via joint attention and intelligent mask loss
Peng Guo,Shuguo Pan,Wang Gao,Kourosh Khoshelham
DOI: https://doi.org/10.1007/s00138-024-01640-1
IF: 2.983
2024-11-29
Machine Vision and Applications
Abstract:Monocular depth estimation is a crucial and challenging undertaking in computer vision, with applications in visual navigation, autonomous vehicles, and robotics. Traditional depth estimation used pricey LiDAR and depth cameras with sparse data. Recently, using Convolutional Neural Networks (CNNs) to derive depth maps from monocular images has gained significant interest. However, obtaining high-precision depth data with a supervised model requires extensive ground-truth datasets for training, significantly limiting the development of this technique. To overcome the reliance on ground truth data, research on self-supervised depth estimation is of vital importance. In this paper, we introduce a new network structure and loss function aimed at improving the precision of depth estimation. Specifically, we design an attention mechanism called SASE-Block, which simultaneously enhances spatial awareness and channel information. Additionally, we propose an intelligent mask to filter out pixels associated with moving objects or those that violate camera motion assumptions, preventing contamination of the training loss. Experimental results on the KITTI benchmark demonstrate that the model is highly competitive among the latest unsupervised methods and approaches the accuracy of supervised models. We also tested our model in other driving scenarios to evaluate the quality and generalizability of its depth maps.
computer science, cybernetics, artificial intelligence,engineering, electrical & electronic