Soft Masked Mamba Diffusion Model for CT to MRI Conversion

Zhenbin Wang,Lei Zhang,Lituan Wang,Zhenwei Zhang
2024-06-23
Abstract:Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are the predominant modalities utilized in the field of medical imaging. Although MRI capture the complexity of anatomical structures with greater detail than CT, it entails a higher financial costs and requires longer image acquisition times. In this study, we aim to train latent diffusion model for CT to MRI conversion, replacing the commonly-used U-Net or Transformer backbone with a State-Space Model (SSM) called Mamba that operates on latent patches. First, we noted critical oversights in the scan scheme of most Mamba-based vision methods, including inadequate attention to the spatial continuity of patch tokens and the lack of consideration for their varying importance to the target task. Secondly, extending from this insight, we introduce Diffusion Mamba (DiffMa), employing soft masked to integrate Cross-Sequence Attention into Mamba and conducting selective scan in a spiral manner. Lastly, extensive experiments demonstrate impressive performance by DiffMa in medical image generation tasks, with notable advantages in input scaling efficiency over existing benchmark models. The code and models are available at <a class="link-external link-https" href="https://github.com/wongzbb/DiffMa-Diffusion-Mamba" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of converting computed tomography (CT) images to magnetic resonance imaging (MRI) images. Although MRI is superior to CT in terms of anatomical detail, it is more expensive and has a longer imaging time. Therefore, by using image generation models to convert CT images to MRI images, the scope of diagnostic examinations can be expanded without increasing costs. The paper proposes a diffusion model based on the Mamba state-space model (State-Space Model) — Diffusion Mamba (DiffMa), for the CT to MRI conversion task. Unlike traditional U-Net or Transformer architectures, DiffMa uses the Mamba model to process latent patches. Specifically, the paper addresses key issues in existing Mamba vision methods, including insufficient spatial continuity of patch tokens and inadequate consideration of different importances in the target task. To overcome these issues, the paper introduces a Soft Mask mechanism and a Spiral-Scan scheme to enhance cross-sequence attention mechanisms and ensure the spatial continuity of the scanning sequence. Experimental results show that DiffMa performs excellently in medical image generation tasks, particularly surpassing existing benchmark models in terms of input scaling efficiency. Additionally, the paper provides a detailed comparison of DiffMa with other methods based on CNN, ViT, and recently proposed Mamba variants, demonstrating its significant performance improvement under the same number of iterations. Overall, DiffMa achieves the advantage of a global receptive field while maintaining linear complexity, achieving excellent results in the CT to MRI conversion task.