Single-Shot Pose Estimation of Surgical Robot Instruments' Shafts from Monocular Endoscopic Images

Masakazu Yoshimura,Murilo M. Marinho,Kanako Harada,Mamoru Mitsuishi
DOI: https://doi.org/10.1109/ICRA40945.2020.9196779
2020-03-03
Abstract:Surgical robots are used to perform minimally invasive surgery and alleviate much of the burden imposed on surgeons. Our group has developed a surgical robot to aid in the removal of tumors at the base of the skull via access through the nostrils. To avoid injuring the patients, a collision-avoidance algorithm that depends on having an accurate model for the poses of the instruments' shafts is used. Given that the model's parameters can change over time owing to interactions between instruments and other disturbances, the online estimation of the poses of the instrument's shaft is essential. In this work, we propose a new method to estimate the pose of the surgical instruments' shafts using a monocular endoscope. Our method is based on the use of an automatically annotated training dataset and an improved pose-estimation deep-learning architecture. In preliminary experiments, we show that our method can surpass state of the art vision-based marker-less pose estimation techniques (providing an error decrease of 55% in position estimation, 64% in pitch, and 69% in yaw) by using artificial images.
Computer Vision and Pattern Recognition,Robotics,Image and Video Processing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of real - time estimation of the pose of surgical robot instrument shafts from monocular endoscope images in minimally invasive surgeries. Specifically: 1. **Background and Requirements**: - Surgical robots are used to perform minimally invasive surgeries, reducing the burden on surgeons. The author's team has developed a surgical robot for transnasal resection of skull - base tumors. - To avoid harming patients, an accurate collision - avoidance algorithm is required, which depends on accurate modeling of the pose of surgical instrument shafts. 2. **Existing Challenges**: - Due to the interaction between instruments and other objects and other interferences, model parameters change over time. Therefore, real - time estimation of the pose of instrument shafts is crucial. - Existing methods (such as using stereo cameras, ultrasonic or electromagnetic trackers) have limitations and are difficult to apply in existing operating room environments, and these sensors themselves also have limitations. 3. **Research Objectives**: - Propose a new method based on monocular endoscope images to estimate the pose of surgical instrument shafts. This method uses an automatically - annotated training dataset and an improved deep - learning architecture for pose estimation. - Overcome the limitations of existing methods, especially in cases of view distortion and occlusion, and improve the accuracy of pose estimation. 4. **Specific Problems**: - How to accurately estimate the pose of surgical instrument shafts in monocular endoscope images to ensure the safety and precision of robot - assisted surgeries. - Solve the challenges brought by view distortion and occlusion, making the estimation results more robust and reliable. ### Method Overview To achieve the above goals, the author proposes a new deep - learning architecture. The main improvements include: - **Regression Instead of Classification**: Improve the SSD - 6D network, change it from a classification task to a regression task to improve the accuracy of pose estimation. - **Data Augmentation**: Use computer - graphics - rendering software to generate a large amount of artificially - annotated training data, and combine it with real endoscope images for data augmentation. - **Multi - task Learning**: Combine bounding - box prediction and class - confidence prediction to further improve the quality of pose estimation. Through these improvements, the author shows in the experiment the superior performance of this method in pose estimation, especially with a significant reduction in the estimation errors of position, pitch angle, and yaw angle. ### Summary The main contribution of this paper is to provide a new deep - learning method that can accurately estimate the pose of surgical instrument shafts from monocular endoscope images, thus providing a more reliable basis for collision - avoidance in robot - assisted surgeries.