Abstract:A pre-trained visual-language model, contrastive language-image pre-training (CLIP), successfully accomplishes various downstream tasks with text prompts, such as finding images or localizing regions within the image. Despite CLIP's strong multi-modal data capabilities, it remains limited in specialized environments, such as medical applications. For this purpose, many CLIP variants-i.e., BioMedCLIP, and MedCLIP-SAMv2-have emerged, but false positives related to normal regions persist. Thus, we aim to present a simple yet important goal of reducing false positives in medical anomaly detection. We introduce a Contrastive LAnguage Prompting (CLAP) method that leverages both positive and negative text prompts. This straightforward approach identifies potential lesion regions by visual attention to the positive prompts in the given image. To reduce false positives, we attenuate attention on normal regions using negative prompts. Extensive experiments with the BMAD dataset, including six biomedical benchmarks, demonstrate that CLAP method enhances anomaly detection performance. Our future plans include developing an automated fine prompting method for more practical usage.

What problem does this paper attempt to address?

This paper attempts to solve the false - positive problem that occurs when using vision - language models (such as CLIP) in medical anomaly detection. Specifically, the authors point out that although pre - trained vision - language models (such as CLIP) perform well in multi - modal data processing, these models still have limitations in professional fields such as medical imaging. In particular, they are prone to misjudging normal regions as abnormal regions, resulting in false - positive results. ### Main objectives of the paper: 1. **Reduce false positives**: By introducing the Contrastive LAnguage Prompting (CLAP) method and using positive and negative text prompts, the accuracy of medical anomaly detection is improved. 2. **Improve anomaly detection performance**: Through extensive experiments on the BMAD dataset, the performance of the CLAP method in multiple biomedical benchmark tests is verified, proving that it can effectively reduce false positives and improve the accuracy of anomaly detection. ### Specific problem description: - **Limitations of existing methods**: Existing CLIP - based variants (such as BioMedCLIP and MedCLIP - SAMv2) have certain improvements in medical image processing, but still have the problem of false positives, that is, wrongly identifying normal regions as abnormal regions. - **Impact of false positives**: False positives may lead to unnecessary medical procedures, increase the burden on the medical system, and may cause harm to patients. ### Solutions: - **CLAP method**: By combining positive and negative text prompts, the CLAP method can accurately identify potential lesion areas in a given image while reducing attention to normal areas, thereby reducing the occurrence of false positives. - **Application of attention mechanism**: Positive prompts guide the model to focus on potential lesion areas, while negative prompts help reduce attention to normal areas, thereby more accurately locating lesion areas. ### Experimental verification: - **Dataset**: The BMAD dataset, which contains six biomedical benchmark tests and covers five anatomical structures. - **Evaluation metric**: AUROC (Area Under the Receiver Operating Characteristic Curve), which is used to evaluate anomaly detection performance. - **Experimental results**: The CLAP method significantly outperforms methods using only positive prompts (such as DINO and PLP) on multiple subsets, especially when dealing with images with small and irregular patterns. ### Summary: The paper proposes a novel CLAP method, aiming to reduce the false - positive problem in medical anomaly detection by combining positive and negative text prompts. The experimental results show that the CLAP method improves the accuracy of anomaly detection in multiple types of medical images and outperforms existing single - prompt methods. Future work will further optimize the automated generation of language prompts to support a wider range of clinical applications.

Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection

[MARINE ENVIRONMENT AND ANTIBIOSIS].

Unified Medical Image-Text-Label Contrastive Learning With Continuous Prompt

CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning

When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection

MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Exploring low-resource medical image classification with weakly supervised prompt learning

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays

GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection

Clinical Contrastive Learning for Biomarker Detection

CLIP in Medical Imaging: A Comprehensive Survey

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Aligning Medical Images with General Knowledge from Large Language Models

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images

AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection

Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain