Abstract:Estimating scene depth from a single image can be widely applied to understand 3D environments due to the easy access of the images captured by consumer-level cameras. Previous works exploit conditional random fields (CRFs) to estimate image depth, where neighboring pixels (superpixels) with similar appearances are constrained to share the same depth. However, the depth may vary significantly in the slanted surface, thus leading to severe estimation errors. In order to eliminate those errors, we propose a superpixel-based normal guided scale invariant deep convolutional field by encouraging the neighboring superpixels with similar appearance to lie on the same 3D plane of the scene. In doing so, a depth-normal multitask CNN is introduced to produce the superpixel-wise depth and surface normal predictions simultaneously. To correct the errors of the roughly estimated superpiexl-wise depth, we develop a normal guided scale invariant CRF (NGSI-CRF). NGSI-CRF consists of a scale invariant unary potential that is able to measure the relative depth between superpixels as well as the absolute depth of superpixels, and a normal guided pairwise potential that constrains spatial relationships between superpixels in accordance with the 3D layout of the scene. In other words, the normal guided pairwise potential is designed to smooth the depth prediction without deteriorating the 3D structure of the depth prediction. The superpixel-wise depth maps estimated by NGSI-CRF will be fed into a pixel-wise refinement module to produce a smooth fine-grained depth prediction. Furthermore, we derive a closed-form solution for the maximum a posteriori (MAP) inference of NGSI-CRF. Thus, our proposed network can be efficiently trained in an end-to-end manner. We conduct our experiments on various datasets, such as NYU-D2, KITTI, and Make 3D. As demonstrated in the experimental results, our method achieves superior performance in both indoor and outdoor scenes.

Depth Map Prediction from a Single Image with Generative Adversarial Nets.

Conditional Generative Adversarial Network for Monocular Image Depth Map Prediction

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

Depth Estimation from Monocular Image and Coarse Depth Points Based on Conditional GAN

Generative Adversarial Networks for Unsupervised Monocular Depth Prediction

Depth Map Inpainting Using a Fully Convolutional Network

Least Square Estimation Network for Depth Completion

Promising Depth Map Prediction Method from a Single Image Based on Conditional Generative Adversarial Network

Unsupervised Learning of Depth Estimation and Camera Pose With Multi-Scale GANs

Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning.

DepthGAN: GAN-based depth generation from semantic layouts

Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein GANs

Depth Images Could Tell Us More: Enhancing Depth Discriminability for RGB-D Scene Recognition

Monocular Depth Estimation with Guidance of Surface Normal Map

Dilated Fully Convolutional Neural Network for Depth Estimation from a Single Image

Boundary-induced and scene-aggregated network for monocular depth prediction

A Novel 3D-UNet Deep Learning Framework Based on High-Dimensional Bilateral Grid for Edge Consistent Single Image Depth Estimation

AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion

Occlusion-aware Unsupervised Light Field Depth Estimation based on Muti-Scale GANs

Single Image Depth Estimation with Normal Guided Scale Invariant Deep Convolutional Fields