Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Akshay Paruchuri,Samuel Ehrenstein,Shuxian Wang,Inbar Fried,Stephen M. Pizer,Marc Niethammer,Roni Sengupta
2024-08-21
Abstract:Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: <a class="link-external link-https" href="https://ppsnet.github.io/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of monocular depth estimation in endoscopic videos. Specifically, the researchers propose a method that utilizes near-field illumination information to improve the accuracy of monocular depth estimation, especially in cases where existing techniques perform poorly due to the lack of strong geometric features and complex lighting effects in endoscopic videos. The key contributions of the paper include: 1. **Proposed supervised and self-supervised loss functions**: By utilizing the near-field illumination information emitted by the endoscope and reflected by surfaces, two new loss functions (supervised and self-supervised) were designed for training on both synthetic data and real clinical data. 2. **Depth refinement network PPSNet**: A new depth refinement architecture called PPSNet was proposed, which combines near-field illumination information to improve initial depth predictions. 3. **Teacher-student transfer learning method**: A teacher-student transfer learning method was developed, where the teacher model guides the student model to learn on unlabeled real clinical data using the proposed self-supervised loss functions. 4. **Experimental results**: Extensive evaluations were conducted on the synthetic C3VD dataset and real clinical data, achieving state-of-the-art results. Through these methods, the researchers significantly improved the performance of monocular depth estimation in endoscopic videos, particularly in handling non-axial views, overcoming the limitations of existing techniques.