Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Akshay Paruchuri,Samuel Ehrenstein,Shuxian Wang,Inbar Fried,Stephen M. Pizer,Marc Niethammer,Roni Sengupta

2024-08-21

Abstract:Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: <a class="link-external link-https" href="https://ppsnet.github.io/" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of monocular depth estimation in endoscopic videos. Specifically, the researchers propose a method that utilizes near-field illumination information to improve the accuracy of monocular depth estimation, especially in cases where existing techniques perform poorly due to the lack of strong geometric features and complex lighting effects in endoscopic videos. The key contributions of the paper include: 1. **Proposed supervised and self-supervised loss functions**: By utilizing the near-field illumination information emitted by the endoscope and reflected by surfaces, two new loss functions (supervised and self-supervised) were designed for training on both synthetic data and real clinical data. 2. **Depth refinement network PPSNet**: A new depth refinement architecture called PPSNet was proposed, which combines near-field illumination information to improve initial depth predictions. 3. **Teacher-student transfer learning method**: A teacher-student transfer learning method was developed, where the teacher model guides the student model to learn on unlabeled real clinical data using the proposed self-supervised loss functions. 4. **Experimental results**: Extensive evaluations were conducted on the synthetic C3VD dataset and real clinical data, achieving state-of-the-art results. Through these methods, the researchers significantly improved the performance of monocular depth estimation in endoscopic videos, particularly in handling non-axial views, overcoming the limitations of existing techniques.

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

A Three-Dimensional Measurement Method for Binocular Endoscopes Based on Deep Learning

Calibration-free Deep Optics for Depth Estimation with Precise Simulation

Self-Supervised Monocular Depth Estimation for Endoscopic Imaging

Image Intrinsic-Based Unsupervised Monocular Depth Estimation in Endoscopy

A geometry-aware deep network for depth estimation in monocular endoscopy

Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction

Monocular endoscopy images depth estimation with multi-scale residual fusion

Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy

EndoDepthL: Lightweight Endoscopic Monocular Depth Estimation with CNN-Transformer

Tackling Challenges of Low-texture and Illumination Variations for Endoscopy Self-supervised Monocular Depth Estimation

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors

EndoPerfect: A Hybrid NeRF-Stereo Vision Approach Pioneering Monocular Depth Estimation and 3D Reconstruction in Endoscopy

MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy

Depth estimation from monocular endoscopy using simulation and image transfer approach

Self-supervised monocular depth estimation for gastrointestinal endoscopy

Self-supervised endoscopy depth estimation framework with CLIP-guidance segmentation

An Enhanced Synthetic Cystoscopic Environment for Use in Monocular Depth Estimation

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Self-supervised monocular depth estimation for high field of view colonoscopy cameras