Abstract:Background and Objective: Recovering high-quality 3D point clouds from monocular endoscopic images is a challenging task. This paper proposes a novel deep learning-based computational framework for 3D point cloud reconstruction from single monocular endoscopic images.Methods: An unsupervised mono-depth learning network is used to generate depth information from monocular images. Given a single mono endoscopic image, the network is capable of depicting a depth map. The depth map is then used to recover a dense 3D point cloud. A generative Endo-AE network based on an auto-encoder is trained to repair defects of the dense point cloud by generating the best representation from the incomplete data. The performance of the proposed framework is evaluated against state-of-the-art learning-based methods. The results are also compared with non-learning based stereo 3D reconstruction algorithms.Results: Our proposed methods outperform both the state-of-the-art learning-based and non-learning based methods for 3D point cloud reconstruction. The Endo-AE model for point cloud completion can generate high-quality, dense 3D endoscopic point clouds from incomplete point clouds with holes. Our framework is able to recover complete 3D point clouds with the missing rate of information up to 60%. Five large medical in-vivo databases of 3D point clouds of real endoscopic scenes have been generated and two synthetic 3D medical datasets are created. We have made these datasets publicly available for researchers free of charge.Conclusions: The proposed computational framework can produce high-quality and dense 3D point clouds from single mono-endoscopy images for augmented reality, virtual reality and other computer-mediated medical applications.

What problem does this paper attempt to address?

This paper attempts to address the problem of recovering high-quality, dense 3D point clouds from monocular endoscopic images. Specifically, the paper proposes a deep learning-based computational framework for reconstructing 3D point clouds from monocular endoscopic images. The main content and contributions of the paper are as follows: ### Background and Objectives - **Background**: In minimally invasive surgery, endoscopes are used to observe the surface of internal organs, but these images usually lack depth information, making it difficult to generate high-quality 3D point clouds. - **Objective**: To propose a new computational framework that can recover high-quality, dense 3D point clouds from monocular endoscopic images to support computer-assisted medical applications such as augmented reality (AR) and virtual reality (VR). ### Methods 1. **Monocular Depth Learning Module**: - Use an unsupervised monocular depth learning network to generate depth maps. - Input monocular endoscopic images and output depth maps. 2. **3D Point Cloud Extraction Module**: - Convert the generated depth maps into dense 3D point clouds. - Use coordinate transformation methods to convert pixel coordinates to world coordinates. - Extract the color attributes of the input monocular endoscopic images and apply them to the 3D point clouds. 3. **3D Point Cloud Completion Module**: - Train a generative Endo-AE network based on an autoencoder to repair defects in the initially generated 3D point clouds. - By randomly deleting continuous points in the test data, the model can generate complete 3D point clouds. ### Results - **3D Point Cloud Reconstruction**: The proposed framework achieves an average Chamfer distance of 0.01514 mm on synthetic medical datasets, outperforming existing learning and non-learning methods. - **3D Point Cloud Completion**: In the test dataset, when the missing rate of input data is 20%, the average Chamfer distance is 0.00236 mm; even with a missing rate as high as 60%, the quality of the completion results remains high, with an average Chamfer distance of 0.00804 mm. ### Main Contributions 1. **Proposed a New Computational Framework**: Combining two deep learning neural networks, one for monocular depth learning and the other for 3D point cloud completion, to recover high-quality 3D point clouds from monocular endoscopic images. 2. **Generated Multiple Large Medical Databases**: Generated five large internal 3D point cloud databases and two synthetic 3D medical datasets, which are freely available to the research community. ### Conclusion - The proposed computational framework can generate high-quality, dense 3D point clouds from monocular endoscopic images, suitable for augmented reality, virtual reality, and other computer-assisted medical applications.

Recovering dense 3D point clouds from single endoscopic image

A Three-Dimensional Measurement Method for Binocular Endoscopes Based on Deep Learning

Three-Dimensional Stitching of Binocular Endoscopic Images Based on Feature Points

Self-supervised neural network-based endoscopic monocular 3D reconstruction method

3D reconstruction from endoscopy images: A survey

A geometry-aware deep network for depth estimation in monocular endoscopy

Self-supervised Dense 3D Reconstruction from Monocular Endoscopic Video

EndoPerfect: A Hybrid NeRF-Stereo Vision Approach Pioneering Monocular Depth Estimation and 3D Reconstruction in Endoscopy

Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network

Single-Image 3-D Reconstruction: Rethinking Point Cloud Deformation

Visual Enhanced 3D Point Cloud Reconstruction from A Single Image

Dense Point Cloud Reconstruction by Shape and Pose Features Learning

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video

MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy

Accurate and robust feature description and dense point-wise matching based on feature fusion for endoscopic images

3D endoscopic depth estimation using 3D surface-aware constraints

Depth estimation from monocular endoscopy using simulation and image transfer approach

Real-Time Dense Reconstruction with Binocular Endoscopy Based on StereoNet and ORB-SLAM

Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction