Structured Clinical Reasoning Prompt Enhances LLM's Diagnostic Capabilities in Diagnosis Please Quiz Cases

Yuki Sonoda,Ryo Kurokawa,Akifumi Hagiwara,Yusuke Asari,Takahiro Fukushima,Jun Kanzawa,Wataru Gonoi,Osamu Abe
DOI: https://doi.org/10.1101/2024.09.01.24312894
2024-09-03
Abstract:Background: Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities. Objective: This study aimed to test whether a prompting approach that aligns with general clinical reasoning methodology, specifically, separating processes of summarizing clinical information and making diagnoses based on the summary instead of one-step processing, can enhance medical diagnostic capabilities of LLM. Methods: 322 quiz questions from Diagnosis Please cases of Radiology (1998-2023) were used. We employed Claude 3.5 Sonnet, a state-of-the-art LLM, to compare three approaches: 1) Conventional zero-shot chain-of-thought prompt, as a baseline, 2) two-step approach: LLM organizes patient history and imaging findings, then provides diagnoses, and 3) Summary-only approach: Using only the LLM-generated summary for diagnoses. Results: The two-step approach significantly outperformed both baseline and summary-only methods in diagnosis accuracy, as determined by McNemar tests. Primary diagnosis accuracy was 60.6% for the two-step approach, compared to 56.5% for baseline (p=0.042) and 56.3% for summary-only (p=0.035). For the top three diagnoses, accuracy was 70.5%, 66.5%, and 65.5% respectively (p=0.005 for baseline, p=0.008 for summary-only). No significant differences were observed between baseline and summary-only approaches. Conclusion: Our results indicate that a structured clinical reasoning approach enhances the diagnostic accuracy of LLM. This method shows potential as a valuable tool for deriving diagnoses from free-text clinical information. The approach aligns well with established clinical reasoning processes, suggesting its potential applicability in real-world clinical settings.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to improve the accuracy of large - language models (LLMs) in medical diagnosis by improving the prompting method. Specifically, the researchers explored whether a method based on structured clinical reasoning, that is, a two - step method (first summarizing clinical information and then making a diagnosis), can enhance the diagnostic ability of LLMs. This method aims to simulate the way human doctors think when dealing with complex cases, that is, first systematically organize and summarize information such as the patient's history and imaging findings, and then make a diagnosis based on this summarized information, rather than directly drawing a diagnostic conclusion from the original information. The research background indicates that although LLMs show potential in medical diagnosis, their performance is affected by the prompting method. Therefore, the goal of this study is to test whether a prompting method consistent with the general clinical reasoning method - that is, processing the summary and diagnosis of clinical information step by step - can enhance the medical diagnostic ability of LLMs. By comparing the diagnostic accuracy under different prompting strategies, the researchers hope to verify whether the structured clinical reasoning prompt can effectively improve the diagnostic performance of LLMs.