Self-supervised Monocular Image Depth Estimation Primed by Transformer and Multi-scale Attention Scheme

LIANG Shui-bo,LIU Zi-yan,SUN Hao-kun,YUAN Hao,LIANG Jing
DOI: https://doi.org/10.20009/j.cnki.21-1106/TP.2021-0632
2023-01-01
Abstract:Aiming at the problems of high-resolution images blurring in edges and contours in current self-supervised monocular image depth estimation, a monocular image depth estimation network combining visual Transformer and multi-scale channel attention scheme is proposed.Firstly, an Encoder-Decoder model is designed, in which the multi-scale feature is extracted by using visual Transformer-Encoder.Secondly, the Residual Channel Attention(RCA) Decoder is designed for optimizing the extracted multi-scale features in detail and merging the features at the upper and lower levels to improve the usability of contextual information.Finally, monocular image depth estimation is performed at multiple scales.The proposed method achieves better performance of higher-quality image depth and clearer contour on KITTI than that of current models.The absolute relative error, squared relative error, and root mean square error of the algorithm are 0.119,0.857 and 4.571,respectively.And the accuracy reaches 0.959,0.995 and 0.999 at different thresholds.The experimental results demonstrate the feasibility and effectiveness of the proposed algorithm.
What problem does this paper attempt to address?