Abstract:Purpose: Recent advances in computer vision and machine learning have resulted in endoscopic video-based solutions for dense reconstruction of the anatomy. To effectively use these systems in surgical navigation, a reliable image-based technique is required to constantly track the endoscopic camera's position within the anatomy, despite frequent removal and re-insertion. In this work, we investigate the use of recent learning-based keypoint descriptors for six degree-of-freedom camera pose estimation in intraoperative endoscopic sequences and under changes in anatomy due to surgical resection. Methods: Our method employs a dense structure from motion (SfM) reconstruction of the preoperative anatomy, obtained with a state-of-the-art patient-specific learning-based descriptor. During the reconstruction step, each estimated 3D point is associated with a descriptor. This information is employed in the intraoperative sequences to establish 2D-3D correspondences for Perspective-n-Point (PnP) camera pose estimation. We evaluate this method in six intraoperative sequences that include anatomical modifications obtained from two cadaveric subjects. Results: Show that this approach led to translation and rotation errors of 3.9 mm and 0.2 radians, respectively, with 21.86% of localized cameras averaged over the six sequences. In comparison to an additional learning-based descriptor (HardNet++), the selected descriptor can achieve a better percentage of localized cameras with similar pose estimation performance. We further discussed potential error causes and limitations of the proposed approach. Conclusion: Patient-specific learning-based descriptors can relocalize images that are well distributed across the inspected anatomy, even where the anatomy is modified. However, camera relocalization in endoscopic sequences remains a persistently challenging problem, and future research is necessary to increase the robustness and accuracy of this technique.

Learning How To Robustly Estimate Camera Pose in Endoscopic Videos

Online 3D reconstruction and dense tracking in endoscopic videos

Stereo Video Reconstruction Without Explicit Depth Maps for Endoscopic Surgery

Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction

ENeRF-SLAM:A Dense Endoscopic SLAM with Neural Implicit Representation

Long term and robust 6DoF motion tracking for highly dynamic stereo endoscopy videos

Investigating keypoint descriptors for camera relocalization in endoscopy surgery

Stereo Dense Scene Reconstruction and Accurate Localization for Learning-Based Navigation of Laparoscope in Minimally Invasive Surgery

BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications

Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery

Deep Homography Prediction for Endoscopic Camera Motion Imitation Learning

Monocular pose estimation of articulated surgical instruments in open surgery

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner

Deep Homography Estimation in Dynamic Surgical Scenes for Laparoscopic Camera Motion Extraction

A geometry-aware deep network for depth estimation in monocular endoscopy

Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments

Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos using Depth Networks and Photometric Constraints

Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery