Depth Estimation and Image Restoration by Deep Learning from Defocused Images

Saqib Nazir,Lorenzo Vaquero,Manuel Mucientes,Víctor M. Brea,Daniela Coltuc
DOI: https://doi.org/10.1109/TCI.2023.3288335
2023-07-28
Abstract:Monocular depth estimation and image deblurring are two fundamental tasks in computer vision, given their crucial role in understanding 3D scenes. Performing any of them by relying on a single image is an ill-posed problem. The recent advances in the field of Deep Convolutional Neural Networks (DNNs) have revolutionized many tasks in computer vision, including depth estimation and image deblurring. When it comes to using defocused images, the depth estimation and the recovery of the All-in-Focus (Aif) image become related problems due to defocus physics. Despite this, most of the existing models treat them separately. There are, however, recent models that solve these problems simultaneously by concatenating two networks in a sequence to first estimate the depth or defocus map and then reconstruct the focused image based on it. We propose a DNN that solves the depth estimation and image deblurring in parallel. Our Two-headed Depth Estimation and Deblurring Network (2HDED:NET) extends a conventional Depth from Defocus (DFD) networks with a deblurring branch that shares the same encoder as the depth branch. The proposed method has been successfully tested on two benchmarks, one for indoor and the other for outdoor scenes: NYU-v2 and Make3D. Extensive experiments with 2HDED:NET on these benchmarks have demonstrated superior or close performances to those of the state-of-the-art models for depth estimation and image deblurring.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The paper attempts to address the problems of depth estimation and image deblurring in a single defocused image. These two tasks are very important in computer vision because they are crucial for understanding 3D scenes. However, relying solely on a single image for depth estimation or image deblurring is an ill-posed problem, meaning there are multiple possible solutions, making it difficult to determine the uniquely correct answer. Although the development of deep convolutional neural networks (DNNs) has significantly improved the performance of these tasks, most existing models still treat these two problems separately. This paper proposes a new deep learning model—Two-Headed Depth Estimation and Deblurring Network (2HDED:NET), aimed at simultaneously solving the problems of depth estimation and image deblurring. 2HDED:NET extracts multi-level features through a shared encoder and uses a Depth Estimation Decoder (DED) and an All-in-Focus Image Decoder (AifD) to generate depth maps and restore clear images, respectively. This approach not only improves the accuracy of depth estimation but also independently completes the image deblurring task without relying on depth estimation. Specifically, the main contributions of 2HDED:NET include: 1. Proposing a parallel architecture that can recover an all-in-focus image and generate a depth map from a single defocused image. 2. Achieving a balance between depth estimation and image deblurring in the architecture, giving equal importance to both tasks. 3. Designing a hybrid loss function that combines the loss terms of depth estimation and image deblurring, as well as regularization terms, encouraging the encoder to learn richer semantic features. 4. Extensive experimental results on the NYU-v2 and Make3D datasets validate the effectiveness of this method. In summary, this study aims to improve the performance of depth estimation and image deblurring in a single defocused image through an innovative multi-task learning approach.