Mastoidectomy Multi-View Synthesis from a Single Microscopy Image

Yike Zhang,Jack Noble

2024-09-01

Abstract:Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purpose. We manually align the surface with a selected microscope frame to obtain an accurate initial pose of the reconstructed CT mesh relative to the microscope. We then perform UV projection to transfer the colors from the frame to surface textures. Novel views of the textured surface can be used to generate a large dataset of synthetic frames with ground truth poses. We evaluated the quality of synthetic views rendered using Pytorch3D and PyVista. We found both rendering engines lead to similarly high-quality synthetic novel-view frames compared to ground truth with a structural similarity index for both methods averaging about 0.86. A large dataset of novel views with known poses is critical for ongoing training of a method to automatically estimate microscope pose for 2D to 3D registration with the pre-operative CT to facilitate augmented reality surgery. This dataset will empower various downstream tasks, such as integrating Augmented Reality (AR) in the OR, tracking surgical tools, and supporting other video analysis studies.

Computer Vision and Pattern Recognition,Graphics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to generate multi - view synthetic videos from a single microscope image during the cochlear implant (CI) surgery. Specifically, the CI surgery requires an invasive mastoidectomy to insert the electrode array into the cochlear. This process demands precise and accurate operations because some important structures such as the facial nerve, aponeurosis and ossicles may be occluded or hidden. To improve the visualization and operational precision of the surgery, the author proposes a new method, which is achieved through the following steps: 1. **Predict the postoperative surface using preoperative CT scans**: Use the patient's preoperative CT scan data to predict the surface after mastoidectomy. 2. **Manually align the surface with the microscope image**: Manually align the predicted surface with the selected microscope frame to obtain the initial accurate pose of the reconstructed CT grid relative to the microscope. 3. **UV - projection texture mapping**: Transfer the color of the microscope image to the surface texture through UV - projection, thereby generating a three - dimensional model with real - color information. 4. **Generate multi - view synthetic videos**: Randomly generate camera poses and render a large number of new - view images. These images have known real poses and can be used to train methods for automatically estimating the microscope pose. This method solves the problem of the lack of a large amount of microscope video data with pose labels in CI surgery and provides support for subsequent tasks such as augmented reality (AR) surgery, surgical tool tracking and other video analysis research. ### Formula Explanation The UV - projection formula mentioned in the paper is as follows: \[ \begin{pmatrix} u \\ v \\ w \end{pmatrix} = \begin{pmatrix} f_x & 0 & c_x & 0 \\ 0 & f_y & c_y & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} \begin{pmatrix} R_{3 \times 3} & t_{3 \times 1} \\ 0_{1 \times 3} & 1_{1 \times 1} \end{pmatrix} \begin{pmatrix} X \\ Y \\ Z \\ 1 \end{pmatrix} \] where: - \( f_x \) and \( f_y \) are the focal lengths of the camera, - \( c_x \) and \( c_y \) are the principal points of the camera, - \( R_{3 \times 3} \) is the rotation matrix, - \( t_{3 \times 1} \) is the translation vector, - \((X, Y, Z, 1)\) is the homogeneous representation of the world coordinates of the vertex, - \((u, v, w)\) is the homogeneous coordinate of the image plane. Through this formula, the world coordinates are converted into camera coordinates, and then converted into the homogeneous coordinates of the image plane through the camera internal parameter matrix. Finally, the image coordinates \( I(u/w, v/w)\) are calculated, and the RGB color values are obtained by linear interpolation. This method not only improves the visualization effect of the surgery, but also provides valuable data resources for future automated surgical assistance systems.

Mastoidectomy Multi-View Synthesis from a Single Microscopy Image

M&M: Unsupervised Mamba-based Mastoidectomy for Cochlear Implant Surgery with Noisy Data

Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery

ViT-MPI: Vision Transformer Multiplane Images for Surgical Single-View View Synthesis.

Stereoscopic calibration for augmented reality visualization in microscopic surgery

Monocular pose estimation of articulated surgical instruments in open surgery

Endoscopic vs. volumetric OCT imaging of mastoid bone structure for pose estimation in minimally invasive cochlear implant surgery

Surgical scene generation and adversarial networks for physics-based iOCT synthesis

Automatic intraoperative stitching of nonoverlapping cone-beam CT acquisitions

Image-based measurement by instrument tip tracking for tympanoplasty using digital surgical microscopy

Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery

Combining physics-based models with deep learning image synthesis and uncertainty in intraoperative cone-beam CT of the brain

Clinical Micro-CT Empowered by Interior Tomography, Robotic Scanning, and Deep Learning

Robotic Arm Platform for Multi-View Image Acquisition and 3D Reconstruction in Minimally Invasive Surgery

Cortical surface tracking using a stereoscopic operating microscope

Adaptive infrared patterns for microscopic surface reconstructions

Calibration of RGBD camera and cone-beam CT for 3D intra-operative mixed reality visualization

A Novel Three-Dimensional Robot Arm Steered Camera for Ear Surgery

Anatomic Depth Estimation and 3-Dimensional Reconstruction of Microsurgical Anatomy Using Monoscopic High-Definition Photogrammetry and Machine Learning

Computational image analysis of distortion, sharpness, and depth of field in a next-generation hybrid exoscopic and microsurgical operative platform

Automatic cochlear multimodal 3D image segmentation and analysis using atlas-model-based method