Pedro Sanchez,Antanas Kascenas,Xiao Liu,Alison Q. O'Neil,Sotirios A. Tsaftaris
Abstract:Reducing the requirement for densely annotated masks in medical image segmentation is important due to cost constraints. In this paper, we consider the problem of inferring pixel-level predictions of brain lesions by only using image-level labels for training. By leveraging recent advances in generative diffusion probabilistic models (DPM), we synthesize counterfactuals of "How would a patient appear if X pathology was not present?". The difference image between the observed patient state and the healthy counterfactual can be used for inferring the location of pathology. We generate counterfactuals that correspond to the minimal change of the input such that it is transformed to healthy domain. This requires training with healthy and unhealthy data in DPMs. We improve on previous counterfactual DPMs by manipulating the generation process with implicit guidance along with attention conditioning instead of using classifiers. Code is available at <a class="link-external link-https" href="https://github.com/vios-s/Diff-SCM" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to train a model using only image - level labels without pixel - level annotations to achieve accurate localization of brain lesions (such as brain tumors). Specifically, the authors hope to synthesize counterfactual images of "what a patient would look like without a certain pathological condition" through Diffusion Probabilistic Models (DPMs) in Generative Models. By comparing the differences between the images of actual patients and healthy counterfactual images, the specific location of the lesion can be inferred.
### Main problems
1. **Reducing the need for dense annotation masks**: Due to cost constraints, obtaining high - quality pixel - level annotations is very expensive, especially in the field of medical imaging. This paper aims to reduce the need for these dense annotations by training only with image - level labels.
2. **Improving the accuracy of lesion localization**: Existing abnormal segmentation methods (such as VAE, GAN, etc.) often can only detect hyper - intensities when dealing with brain lesions and cannot accurately localize the lesions. This paper proposes a new method to more accurately localize the lesion area by generating counterfactual images of the healthy state and calculating the differences between them and the original images.
### Solution overview
- **Diffusion Probabilistic Models (DPMs)**: Use DPMs to generate counterfactual images of the healthy state instead of relying solely on abnormal detection models. DPMs generate images by gradually adding noise and learning the denoising process.
- **Counterfactual generation**: Transform the input image into a healthy state with minimal intervention. This method can ensure that other parts remain unchanged except for the lesion area.
- **Implicit guidance and attention mechanism**: To avoid using additional classifiers, this paper introduces implicit guidance and attention conditioning to simplify the training process and improve robustness.
- **Dynamic normalization**: To prevent pixel values in the latent space from saturating, this paper proposes a dynamic normalization method to ensure that the generated images are of higher quality and easier to operate.
### Conclusion
This paper verifies the effectiveness of the proposed method through a series of experiments, especially in the task of brain lesion localization, surpassing several existing generative models. In addition, this method does not require additional classifier guidance, simplifies the training process, and improves the robustness to hyper - parameter selection.
### Formula summary
The key formulas involved in the paper include:
1. Noise addition in the diffusion process:
\[
p(x_t|x_0) = \mathcal{N}(x_t; \sqrt{\alpha_t} x_0, (1 - \alpha_t) I)
\]
where \(\alpha_t=\prod_{j = 0}^t(1 - \beta_j)\), and \(I\) is the identity matrix.
2. Training objective:
\[
\theta^*=\arg\min_\theta\mathbb{E}_{x_0, t, \epsilon}[\|\epsilon_\theta(x_t, c, t)-\epsilon\|^2_2]
\]
where \(x_t = \sqrt{\alpha_t}x_0+\sqrt{1 - \alpha_t}\epsilon\).
3. Reverse sampling process (DDIM):
\[
x_{t - 1}=\sqrt{\alpha_{t - 1}}\left(x_t-\sqrt{1 - \alpha_t}\cdot\epsilon_\theta(x_t, c, t)/\sqrt{\alpha_t}\right)+\sqrt{1 - \alpha_{t - 1}}\epsilon_\theta(x_t, c, t)
\]