Deep learning in computed tomography super resolution using multi-modality data training

Wai Yan Ryana Fok,Andreas Fieselmann,Magdalena Herbst,Ludwig Ritschl,Steffen Kappler,Sylvia Saalfeld
DOI: https://doi.org/10.1002/mp.16825
Abstract:Background: One of the limitations in leveraging the potential of artificial intelligence in X-ray imaging is the limited availability of annotated training data. As X-ray and CT shares similar imaging physics, one could achieve cross-domain data sharing, so to generate labeled synthetic X-ray images from annotated CT volumes as digitally reconstructed radiographs (DRRs). To account for the lower resolution of CT and the CT-generated DRRs as compared to the real X-ray images, we propose the use of super-resolution (SR) techniques to enhance the CT resolution before DRR generation. Purpose: As spatial resolution can be defined by the modulation transfer function kernel in CT physics, we propose to train a SR network using paired low-resolution (LR) and high-resolution (HR) images by varying the kernel's shape and cutoff frequency. This is different to previous deep learning-based SR techniques on RGB and medical images which focused on refining the sampling grid. Instead of generating LR images by bicubic interpolation, we aim to create realistic multi-detector CT (MDCT) like LR images from HR cone-beam CT (CBCT) scans. Methods: We propose and evaluate the use of a SR U-Net for the mapping between LR and HR CBCT image slices. We reconstructed paired LR and HR training volumes from the same CT scans with small in-plane sampling grid size of 0.20 × 0.20 mm 2 $0.20 \times 0.20 \, {\rm mm}^2$ . We used the residual U-Net architecture to train two models. SRUN R e s K $^K_{Res}$ : trained with kernel-based LR images, and SRUN R e s I $^I_{Res}$ : trained with bicubic downsampled data as baseline. Both models are trained on one CBCT dataset (n = 13 391). The performance of both models was then evaluated on unseen kernel-based and interpolation-based LR CBCT images (n = 10 950), and also on MDCT images (n = 1392). Results: Five-fold cross validation and ablation study were performed to find the optimal hyperparameters. Both SRUN R e s K $^K_{Res}$ and SRUN R e s I $^I_{Res}$ models show significant improvements (p-value < $&lt;$ 0.05) in mean absolute error (MAE), peak signal-to-noise ratio (PSNR) and structural similarity index measures (SSIMs) on unseen CBCT images. Also, the improvement percentages in MAE, PSNR, and SSIM by SRUN R e s K $^K_{Res}$ is larger than SRUN R e s I $^I_{Res}$ . For SRUN R e s K $^K_{Res}$ , MAE is reduced by 14%, and PSNR and SSIMs increased by 6 and 8%, respectively. To conclude, SRUN R e s K $^K_{Res}$ outperforms SRUN R e s I $^I_{Res}$ , which the former generates sharper images when tested with kernel-based LR CBCT images as well as cross-modality LR MDCT data. Conclusions: Our proposed method showed better performance than the baseline interpolation approach on unseen LR CBCT. We showed that the frequency behavior of the used data is important for learning the SR features. Additionally, we showed cross-modality resolution improvements to LR MDCT images. Our approach is, therefore, a first and essential step in enabling realistic high spatial resolution CT-generated DRRs for deep learning training.
What problem does this paper attempt to address?