Seunghoi Kim,Henry F. J. Tregidgo,Ahmed K. Eldaly,Matteo Figini,Daniel C. Alexander
Abstract:Low-field (LF) MRI scanners (<1T) are still prevalent in settings with limited resources or unreliable power supply. However, they often yield images with lower spatial resolution and contrast than high-field (HF) scanners. This quality disparity can result in inaccurate clinician interpretations. Image Quality Transfer (IQT) has been developed to enhance the quality of images by learning a mapping function between low and high-quality images. Existing IQT models often fail to restore high-frequency features, leading to blurry output. In this paper, we propose a 3D conditional diffusion model to improve 3D volumetric data, specifically LF MR images. Additionally, we incorporate a cross-batch mechanism into the self-attention and padding of our network, ensuring broader contextual awareness even under small 3D patches. Experiments on the publicly available Human Connectome Project (HCP) dataset for IQT and brain parcellation demonstrate that our model outperforms existing methods both quantitatively and qualitatively. The code is publicly available at \url{<a class="link-external link-https" href="https://github.com/edshkim98/DiffusionIQT" rel="external noopener nofollow">this https URL</a>}.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the low - quality of images in low - field (Low - Field, LF) magnetic resonance imaging (MRI). Specifically, low - field MRI scanners (<1T) are still widely used in situations with limited resources or unstable power supply, but the images generated by these devices usually have low spatial resolution and contrast, which may lead to inaccurate interpretation of the images by doctors. To improve this situation, the paper proposes an image quality transfer (IQT) method based on a 3D conditional diffusion model, aiming to improve the quality of low - field MRI images by learning the mapping function between low - quality and high - quality images.
### Main contributions
1. **3D conditional diffusion model**: The paper proposes a new 3D conditional diffusion model (DiffusionIQT), which is specifically used to enhance 3D volume data, such as low - field MRI images.
2. **Cross - batch mechanism**: A cross - batch mechanism is introduced. By using self - attention and padding operations to share information between small 3D patches, it ensures a broader context perception.
3. **Network architecture**: A 3D neural network including an encoder and a decoder is designed. The encoder uses transformers and convolution blocks to capture local and global information, and the decoder uses channel shuffling and convolution blocks for efficient up - sampling.
4. **Experimental verification**: Experiments on IQT and brain segmentation tasks were carried out on the publicly available Human Connectome Project (HCP) dataset. The results show that the proposed model is superior to existing methods in both quantitative and qualitative aspects.
### Mathematical model
The diffusion process in the paper is divided into a forward process and a reverse process:
#### Forward process
In the forward process, Gaussian noise is gradually added to the high - definition image \(x\) until it becomes isotropic Gaussian noise at time step \(t = 1\). Formally, given any time step \(s\), the probability distribution of the image \(x_t\) is defined as:
\[q(x_t|x_s)=\mathcal{N}(\alpha_{t|s}x_s,\sigma^2_{t|s}I)\]
where,
\[\alpha_{t|s}=\frac{\alpha_t}{\alpha_s},\quad\sigma^2_{t|s}=\sigma^2_t-\alpha^2_{t|s}\sigma^2_s\]
Here, \(\alpha_t\) and \(\sigma^2_t\) represent the diffusion coefficient and noise level at time step \(t\), respectively, which are determined by the cosine scheduler.
#### Reverse process
In the reverse process, the goal is to gradually denoise from isotropic Gaussian noise and finally restore the clean high - definition image. This goal is achieved by maximizing the likelihood of \(p(x)\). For a finite number of time steps \(T\), the data \(x\) can be represented by a generative model as:
\[p(x)=\int p(x_1)p(x|x_0,x_c)\prod_{i = 1}^T p(x_{s(i)}|x_{t(i)},x_c)dx_{0:1}\]
where, \(p(x_1)=\mathcal{N}(x_1;0,I)\), \(x_c\) is the conditional low - quality MRI image. To maximize the likelihood, a neural network is trained to approximate \(q(x_s|x_t,x)\) with \(p_\theta(x_s|x_t,x_c)\), where \(\theta\) is a learnable parameter. By using the conjugate prior property of Bayes' theorem, it can be ensured that \(q(x_s|x_t,x)\) is a Gaussian distribution. Therefore, the model \(p_\theta(x_s|x_t,x_c)\sim\mathcal{N}(x_s;\hat{\mu}_\theta(x_t;t,x_c),\sigma^2_Q)\), where \(\hat{\mu}_\theta\) represents using neural