Abstract:Studies have shown that the observed image texture details and semantic information are of great significance for the depth estimation on the road scenes. However, there are ambiguities and inaccuracies in the boundary information of observed objects in previous methods. For this reason, we hope to design a new depth estimation method that can obtain higher accuracy and more accurate boundary information of the detected object. Based on polarized self-attention (PSA) and feature pyramid U-net, we proposed a new self-supervised monocular depth estimation model to extract more accurate texture details and semantic information. Firstly, we add a PSA module at the end of the depth encoder and pose encoder so that the network can extract more accurate semantic information. Then, based on the U-net, we put the multi-scale image obtained by the object detection module FPN (Feature Pyramid network) directly into the decoder. It can guide the model to learn semantic information, thus enhancing the boundary of the image. We evaluated our method on KITTI 2015 datasets and Make3D datasets, and our model achieved better results than previous studies. In order to verify the generalization of the model, we have done monocular, stereo, monocular plus stereo experiments. The experimental results show that our model has achieved better results in several main evaluation indexes and clearer boundary information. In order to compare different forms of PSA mechanism, we did ablation experiments. Compared with no PSA module, after adding the PSA module, better results in evaluating indicators were achieved. We also found that our model is better in monocular training than stereo training and monocular plus stereo training.

Object Detection with Depth Information in Road Scenes

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

Monocular Depth Estimation Based on Unsupervised Learning

Depth Estimation of Traffic Scenes from Image Sequence Using Deep Learning.

An Algorithm on Monocular 3D Object Detection Based on Depth Estimation

Depth incorporating with color improves salient object detection

Monocular 3D Object Detection With Sequential Feature Association and Depth Hint Augmentation

Depth-Enhancement Network for Monocular 3D object detection

Depth Estimation Using Feature Pyramid U-Net and Polarized Self-Attention for Road Scenes

Task-Aware Monocular Depth Estimation for 3D Object Detection

Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking

Boosting Monocular 3D Object Detection with Object-Centric Auxiliary Depth Supervision

3D Object Aided Self-Supervised Monocular Depth Estimation

Self-supervised Monocular Depth Estimation with Multi-Scale Structure Similarity Loss

Road Object Detection Using a Disparity-Based Fusion Model

MonoCD: Monocular 3D Object Detection with Complementary Depths

Monocular Visual Object 3D Localization in Road Scenes

Object Detection and Depth Estimation Approach Based on Deep Convolutional Neural Networks

3D Street Object Detection from Monocular Images Using Deep Learning and Depth Information

3-D LiDAR + Monocular Camera: an Inverse-Depth-Induced Fusion Framework for Urban Road Detection