Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays

Zhichao Sun,Yuliang Gu,Yepeng Liu,Zerui Zhang,Zhou Zhao,Yongchao Xu
2024-06-20
Abstract:Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based methods for anomaly detection in chest X-rays. Considering the discrepancy between the CLIP pre-training data and the task-specific data, we propose a position-guided prompt learning method. Specifically, inspired by the fact that experts diagnose chest X-rays by carefully examining distinct lung regions, we propose learnable position-guided text and image prompts to adapt the task data to the frozen pre-trained CLIP-based model. To enhance the model's discriminative capability, we propose a novel structure-preserving anomaly synthesis method within chest x-rays during the training process. Extensive experiments on three datasets demonstrate that our proposed method outperforms some state-of-the-art methods. The code of our implementation is available at <a class="link-external link-https" href="https://github.com/sunzc-sunny/PPAD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the anomaly detection in chest X - ray images. Specifically, the authors aim to explore the potential of CLIP (Contrastive Language - Image Pre - training) - based methods in chest X - ray anomaly detection and propose a new method to address the following challenges: 1. **Differences between pre - training data and task - specific data**: The CLIP model is pre - trained on a large number of general image - text pairs, while chest X - ray images have specific domain characteristics. Therefore, directly applying CLIP to chest X - ray images may lead to a performance degradation. 2. **Improving the discriminative ability of the model**: Traditional anomaly detection methods mainly identify anomalies by modeling the distribution of normal images, but this method may not fully capture the subtle differences of abnormal samples. To solve these problems, the authors propose the **Position - Guided Prompt Learning method (PPAD)**. The main innovations of this method include: - **Introducing position - guided text and image prompts**: Considering that experts will carefully examine different lung areas when diagnosing chest X - rays, the authors propose learnable position - guided text and image prompts to adapt to the pre - trained CLIP model. - **Structure - Preserving Anomaly Synthesis method (SAS)**: To enhance the discriminative ability of the model, the authors introduce a new anomaly synthesis method during the training process, generating more realistic abnormal samples through Gamma correction while maintaining the integrity of the lung structure. Through these improvements, PPAD can achieve better performance than existing methods on three publicly available chest X - ray datasets, especially with significant improvements in accuracy (ACC), area under the curve (AUC), and F1 - score. ### Formula Summary - **Text input embedding**: \[ E_{\text{text}} = E_{\text{pos}}^t \oplus P_t \oplus E_{\text{cls}}^t \] where $\oplus$ represents embedding concatenation. - **Image input embedding**: \[ E_{\text{image}} = E_i \odot M+ P_i \odot (1 - M) \] where $\odot$ represents element - wise multiplication. - **Gamma correction formula**: \[ \gamma(x)=1+\frac{D(x)}{\max_{x' \in M_a} D(x')}\cdot w \] where $w > - 1$ ensures that $\gamma > 0$. Through these methods, PPAD not only improves the accuracy of anomaly detection, but also enhances the robustness and generalization ability of the model.