Mastoidectomy Multi-View Synthesis from a Single Microscopy Image

Yike Zhang,Jack Noble
2024-09-01
Abstract:Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purpose. We manually align the surface with a selected microscope frame to obtain an accurate initial pose of the reconstructed CT mesh relative to the microscope. We then perform UV projection to transfer the colors from the frame to surface textures. Novel views of the textured surface can be used to generate a large dataset of synthetic frames with ground truth poses. We evaluated the quality of synthetic views rendered using Pytorch3D and PyVista. We found both rendering engines lead to similarly high-quality synthetic novel-view frames compared to ground truth with a structural similarity index for both methods averaging about 0.86. A large dataset of novel views with known poses is critical for ongoing training of a method to automatically estimate microscope pose for 2D to 3D registration with the pre-operative CT to facilitate augmented reality surgery. This dataset will empower various downstream tasks, such as integrating Augmented Reality (AR) in the OR, tracking surgical tools, and supporting other video analysis studies.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate multi - view synthetic videos from a single microscope image during the cochlear implant (CI) surgery. Specifically, the CI surgery requires an invasive mastoidectomy to insert the electrode array into the cochlear. This process demands precise and accurate operations because some important structures such as the facial nerve, aponeurosis and ossicles may be occluded or hidden. To improve the visualization and operational precision of the surgery, the author proposes a new method, which is achieved through the following steps: 1. **Predict the postoperative surface using preoperative CT scans**: Use the patient's preoperative CT scan data to predict the surface after mastoidectomy. 2. **Manually align the surface with the microscope image**: Manually align the predicted surface with the selected microscope frame to obtain the initial accurate pose of the reconstructed CT grid relative to the microscope. 3. **UV - projection texture mapping**: Transfer the color of the microscope image to the surface texture through UV - projection, thereby generating a three - dimensional model with real - color information. 4. **Generate multi - view synthetic videos**: Randomly generate camera poses and render a large number of new - view images. These images have known real poses and can be used to train methods for automatically estimating the microscope pose. This method solves the problem of the lack of a large amount of microscope video data with pose labels in CI surgery and provides support for subsequent tasks such as augmented reality (AR) surgery, surgical tool tracking and other video analysis research. ### Formula Explanation The UV - projection formula mentioned in the paper is as follows: \[ \begin{pmatrix} u \\ v \\ w \end{pmatrix} = \begin{pmatrix} f_x & 0 & c_x & 0 \\ 0 & f_y & c_y & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} \begin{pmatrix} R_{3 \times 3} & t_{3 \times 1} \\ 0_{1 \times 3} & 1_{1 \times 1} \end{pmatrix} \begin{pmatrix} X \\ Y \\ Z \\ 1 \end{pmatrix} \] where: - \( f_x \) and \( f_y \) are the focal lengths of the camera, - \( c_x \) and \( c_y \) are the principal points of the camera, - \( R_{3 \times 3} \) is the rotation matrix, - \( t_{3 \times 1} \) is the translation vector, - \((X, Y, Z, 1)\) is the homogeneous representation of the world coordinates of the vertex, - \((u, v, w)\) is the homogeneous coordinate of the image plane. Through this formula, the world coordinates are converted into camera coordinates, and then converted into the homogeneous coordinates of the image plane through the camera internal parameter matrix. Finally, the image coordinates \( I(u/w, v/w)\) are calculated, and the RGB color values are obtained by linear interpolation. This method not only improves the visualization effect of the surgery, but also provides valuable data resources for future automated surgical assistance systems.