Abstract:Background and objective: Gastrointestinal (GI) endoscopy represents a promising tool for GI cancer screening. However, the limited field of view and uneven skills of endoscopists make it remains difficult to accurately identify polyps and follow up on precancerous lesions under endoscopy. Estimating depth from GI endoscopic sequences is essential for a series of AI-assisted surgical techniques. Nonetheless, depth estimation algorithm of GI endoscopy is a challenging task due to the particularity of the environment and the limitation of datasets. In this paper, we propose a self-supervised monocular depth estimation method for GI endoscopy. Methods: A depth estimation network and a camera ego-motion estimation network are firstly constructed to obtain the depth information and pose information of the sequence respectively, and then the model is enabled to perform self-supervised training by calculating the multi-scale structural similarity with L1 norm (MS-SSIM+L1) loss function between the target frame and the reconstructed image as part of the loss of the training network. The MS-SSIM+L1 loss function is good for reserving high-frequency information and can maintain the invariance of brightness and color. Our model consists of the U-shape convolutional network with the dual-attention mechanism, which is beneficial to capture muti-scale contextual information, and greatly improves the accuracy of depth estimation. We evaluated our method qualitatively and quantitatively with different state-of-the-art methods. Results and conclusions: The experimental results manifest that our method has superior generality, achieving lower error metrics and higher accuracy metrics on both the UCL dataset and the Endoslam dataset. The proposed method has also been validated with clinical GI endoscopy, demonstrating the potential clinical value of the model.

Chat: Cascade Hole-Aware Transformers with Geometric Spatial Consistency for Accurate Monocular Endoscopic Depth Estimation

A Three-Dimensional Measurement Method for Binocular Endoscopes Based on Deep Learning

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Monocular Depth Estimation Based on Unsupervised Learning

A geometry-aware deep network for depth estimation in monocular endoscopy

EndoDepthL: Lightweight Endoscopic Monocular Depth Estimation with CNN-Transformer

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy

Self-Supervised Monocular Depth Estimation for Endoscopic Imaging

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Depth estimation from monocular endoscopy using simulation and image transfer approach

Self-supervised endoscopy depth estimation framework with CLIP-guidance segmentation

EndoPerfect: A Hybrid NeRF-Stereo Vision Approach Pioneering Monocular Depth Estimation and 3D Reconstruction in Endoscopy

Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction

Monocular endoscopy images depth estimation with multi-scale residual fusion

Self-supervised monocular depth estimation for gastrointestinal endoscopy

MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy

Image Intrinsic-Based Unsupervised Monocular Depth Estimation in Endoscopy

Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency

Tackling Challenges of Low-texture and Illumination Variations for Endoscopy Self-supervised Monocular Depth Estimation