Abstract:Background and objective: Gastrointestinal (GI) endoscopy represents a promising tool for GI cancer screening. However, the limited field of view and uneven skills of endoscopists make it remains difficult to accurately identify polyps and follow up on precancerous lesions under endoscopy. Estimating depth from GI endoscopic sequences is essential for a series of AI-assisted surgical techniques. Nonetheless, depth estimation algorithm of GI endoscopy is a challenging task due to the particularity of the environment and the limitation of datasets. In this paper, we propose a self-supervised monocular depth estimation method for GI endoscopy. Methods: A depth estimation network and a camera ego-motion estimation network are firstly constructed to obtain the depth information and pose information of the sequence respectively, and then the model is enabled to perform self-supervised training by calculating the multi-scale structural similarity with L1 norm (MS-SSIM+L1) loss function between the target frame and the reconstructed image as part of the loss of the training network. The MS-SSIM+L1 loss function is good for reserving high-frequency information and can maintain the invariance of brightness and color. Our model consists of the U-shape convolutional network with the dual-attention mechanism, which is beneficial to capture muti-scale contextual information, and greatly improves the accuracy of depth estimation. We evaluated our method qualitatively and quantitatively with different state-of-the-art methods. Results and conclusions: The experimental results manifest that our method has superior generality, achieving lower error metrics and higher accuracy metrics on both the UCL dataset and the Endoslam dataset. The proposed method has also been validated with clinical GI endoscopy, demonstrating the potential clinical value of the model.

Self-supervised Monocular Depth Estimation with 3D Displacement Module for Laparoscopic Images

Monocular Depth Estimation Based on Unsupervised Learning

Self-Supervised Depth Estimation in Laparoscopic Image using 3D Geometric Consistency

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene

Self-Supervised Generative Adversarial Network for Depth Estimation in Laparoscopic Images

Image Intrinsic-Based Unsupervised Monocular Depth Estimation in Endoscopy

Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction

3D Object Aided Self-Supervised Monocular Depth Estimation

Self-supervised monocular depth estimation for high field of view colonoscopy cameras

Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

SMUDLP: Self-Teaching Multi-Frame Unsupervised Endoscopic Depth Estimation with Learnable Patchmatch

MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy

Self-supervised monocular depth estimation for gastrointestinal endoscopy

WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters

Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network

Details preserved unsupervised depth estimation by fusing traditional stereo knowledge from laparoscopic images

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Self-supervised neural network-based endoscopic monocular 3D reconstruction method

A geometry-aware deep network for depth estimation in monocular endoscopy