Abstract:Monocular depth estimation technology is widely utilized in autonomous driving for sensing and obstacle avoidance. Recent advancements in deep-learning techniques have resulted in significant progress in monocular depth estimation. However, monocular depth estimation is mainly optimized for the luminosity error of pixels, mostly disregarding the related problems of result ambiguity and boundary artifacts in the image. To address these issues, we developed an improved network model called SAU-Net. The superposition of excessive convolutional layers in conventional convolution networks impairs the network's timeliness and results in the loss of primary information. Therefore, we propose a convolution-free stratified transformer as an image feature extractor at the network's coding end, which limits self-attention to innumerable windows and leverages sliding windows for characterization to reduce the network delay. This study also addresses the issue of critical information loss. We connect each feature map directly to another from a different scale. In addition, an attention module is introduced to focus on the effective features, which increases the amount of target information in the depth map. We employ the gradient loss function during the training stage to improve the segmentation accuracy of the network and the smoothness of the output image. Training and testing were conducted using the KITTI dataset. To ensure the robustness of the algorithm in practical applications, we also validated the algorithm using a campus dataset that we collected. The experimental results indicated that the accuracy of the algorithm was 89.1%, 96.4%, and 98.5% under three proportional thresholds. The estimated depth map was clear in details and edges, with fewer artifacts.

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Monocular Depth Estimation Based on Unsupervised Learning

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

Self-supervised Monocular Depth Estimation with Coordinate Attention

MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability

LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation.

MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation

Attention-Based Monocular Depth Estimation Considering Global and Local Information in Remote Sensing Images

Self-supervised Monocular Depth Estimation with Large Kernel Attention

PADENet: an Efficient and Robust Panoramic Monocular Depth Estimation Network for Outdoor Scenes.

Self-supervised monocular depth estimation via joint attention and intelligent mask loss

Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module

SAU-Net: Monocular Depth Estimation Combining Multi-Scale Features and Attention Mechanisms

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network

Adaptive Weighted Network With Edge Enhancement Module For Monocular Self-Supervised Depth Estimation

BRNet: Exploring Comprehensive Features for Monocular Depth Estimation.

DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation

Structure-Attentioned Memory Network for Monocular Depth Estimation