Abstract:Medical vision-language model models often struggle with generating accurate quantitative measurements in radiology reports, leading to hallucinations that undermine clinical reliability. We introduce FactCheXcker, a modular framework that de-hallucinates radiology report measurements by leveraging an improved query-code-update paradigm. Specifically, FactCheXcker employs specialized modules and the code generation capabilities of large language models to solve measurement queries generated based on the original report. After extracting measurable findings, the results are incorporated into an updated report. We evaluate FactCheXcker on endotracheal tube placement, which accounts for an average of 78% of report measurements, using the MIMIC-CXR dataset and 11 medical report-generation models. Our results show that FactCheXcker significantly reduces hallucinations, improves measurement precision, and maintains the quality of the original reports. Specifically, FactCheXcker improves the performance of all 11 models and achieves an average improvement of 94.0% in reducing measurement hallucinations measured by mean absolute error.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the difficulty of medical image report - generation models in generating accurate quantitative measurement results in chest X - ray (CXR) reports. Specifically, existing medical vision - language models often make mistakes or have "hallucinations" in terms of quantitative measurements when generating radiology reports, that is, the content generated by the models does not match the actual images, which seriously affects clinical reliability. #### Main problems: 1. **Measurement hallucinations**: Medical report - generation models are prone to inaccurate numerical predictions when dealing with tasks that require precise measurements, such as determining the size of lung nodules or measuring the distance from the endotracheal tube (ETT) to the carina. 2. **Clinical reliability**: Incorrect or missing measurement values may lead to adverse clinical outcomes because many reporting guidelines rely on precise thresholds. For example, if the position of the endotracheal tube is incorrect, it may lead to serious complications such as hypoxia, pneumothorax, and even death. 3. **Limitations of existing models**: Current medical report - generation models lack the ability to accurately interpret fine - grained quantitative information and spatial relationships, especially performing poorly on key measurement tasks in medical images. ### Solutions: To address the above challenges, the authors proposed the **FactCheXcker** framework. FactCheXcker is a modular tool pipeline for re - evaluating and updating measurement values in model - generated radiology reports without retraining or modifying the original model. Its core functions include: - **Query Generator**: Generate measurement queries based on the original report and identify potential measurement differences. - **Code Generator**: Generate executable code based on the queries to obtain accurate measurement results from the images. - **Report Updater**: Integrate the verified measurement results into the report and update or delete inaccurate content. Through this method, FactCheXcker can significantly reduce measurement hallucinations, improve measurement accuracy, and maintain the quality of the original report. Experimental results show that FactCheXcker achieved an average 94% reduction rate of measurement hallucinations on multiple models and significantly improved the accuracy of endotracheal tube position measurement. ### Conclusion: The proposal of FactCheXcker provides an effective solution to the problem of measurement hallucinations in medical image report generation, enhancing the reliability and practicality of these models in clinical applications.

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports

CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting

Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback

Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation

CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation

Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

Clinically Accurate Chest X-Ray Report Generation

Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

On the notion of Hallucinations from the lens of Bias and Validity in Synthetic CXR Images

X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation

Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMs

RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Reinforced visual interaction fusion radiology report generation