Abstract:The depth images acquired by consumer depth sensors (e.g., Kinect and ToF) usually are of low resolution and insufficient quality. One natural solution is to incorporate a high resolution RGB camera and exploit the statistical correlation of its data and depth. In recent years, both optimization-based and learning-based approaches have been proposed to deal with the guided depth reconstruction problems. In this paper, we introduce a weighted analysis sparse representation (WASR) model for guided depth image enhancement, which can be considered a generalized formulation of a wide range of previous optimization-based models. We unfold the optimization by the WASR model and conduct guided depth reconstruction with dynamically changed stage-wise operations. Such a guidance strategy enables us to dynamically adjust the stage-wise operations that update the depth image, thus improving the reconstruction quality and speed. To learn the stage-wise operations in a task-driven manner, we propose two parameterizations and their corresponding methods: dynamic guidance with Gaussian RBF nonlinearity parameterization (DG-RBF) and dynamic guidance with CNN nonlinearity parameterization (DG-CNN). The network structures of the proposed DG-RBF and DG-CNN methods are designed with the the objective function of our WASR model in mind and the optimal network parameters are learned from paired training data. Such optimization-inspired network architectures enable our models to leverage the previous expertise as well as take benefit from training data. The effectiveness is validated for guided depth image super-resolution and for realistic depth image reconstruction tasks using standard benchmarks. Our DG-RBF and DG-CNN methods achieve the best quantitative results (RMSE) and better visual quality than the state-of-the-art approaches at the time of writing. The code is available at https://github.com/ShuhangGu/GuidedDepthSR.

DRO: Deep Recurrent Optimizer for Video to Depth

DRO: Deep Recurrent Optimizer for Structure-from-Motion.

CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

Depth Estimation of Traffic Scenes from Image Sequence Using Deep Learning.

NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation

Learning Depth from Monocular Videos Using Direct Methods

Video Depth without Video Models

Towards Practical Consistent Video Depth Estimation.

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

Depth Estimation of Video Sequences with Perceptual Losses

CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability

Recurrent Neural Network for Learning DenseDepth and Ego-Motion from Video

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints

Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos

Learned Dynamic Guidance for Depth Image Reconstruction

DF-VO: What Should Be Learnt for Visual Odometry?

Boosting Monocular Depth Estimation with Sparse Guided Points

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks