Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting

Tianle Zeng,Gerardo Loza Galindo,Junlei Hu,Pietro Valdastri,Dominic Jones
2024-07-20
Abstract:Computer vision technologies markedly enhance the automation capabilities of robotic-assisted minimally invasive surgery (RAMIS) through advanced tool tracking, detection, and localization. However, the limited availability of comprehensive surgical datasets for training represents a significant challenge in this field. This research introduces a novel method that employs 3D Gaussian Splatting to generate synthetic surgical datasets. We propose a method for extracting and combining 3D Gaussian representations of surgical instruments and background operating environments, transforming and combining them to generate high-fidelity synthetic surgical scenarios. We developed a data recording system capable of acquiring images alongside tool and camera poses in a surgical scene. Using this pose data, we synthetically replicate the scene, thereby enabling direct comparisons of the synthetic image quality (29.592 PSNR). As a further validation, we compared two YOLOv5 models trained on the synthetic and real data, respectively, and assessed their performance in an unseen real-world test dataset. Comparing the performances, we observe an improvement in neural network performance, with the synthetic-trained model outperforming the real-world trained model by 12%, testing both on real-world data.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the complexity of training and supervising computer vision techniques in Robot-Assisted Minimally Invasive Surgery (RAMIS) due to the scarcity of high-quality annotated data. Specifically, the paper proposes a novel method based on 3D Gaussian Splatting to generate synthetic surgical image datasets. This method extracts and combines 3D Gaussian representations of surgical instruments and their background environments, then transforms and fuses them to generate high-fidelity synthetic surgical scenes. Additionally, the paper introduces a data recording system capable of capturing images of the surgical scene as well as the pose information of instruments and cameras. This pose data is used to synthesize scenes, allowing for direct comparison of the quality of synthetic images. By comparing the performance of a YOLOv5 model trained on synthetic data with a model trained on real data on an unknown real-world test dataset, the study found that the model trained on synthetic data improved performance by 12%, validating the effectiveness of the proposed method. Overall, the core contribution of the paper lies in the first application of 3D Gaussian Splatting to medical image dataset generation, and in proposing a flexible and efficient method that not only generates high-quality synthetic image datasets but also automatically generates precise annotation information, thereby aiding the training of neural networks.