Large Language Models Perform Diagnostic Reasoning

Cheng-Kuang Wu,Wei-Lin Chen,Hsin-Hsi Chen
DOI: https://doi.org/10.48550/arXiv.2307.08922
2023-07-18
Abstract:We explore the extension of chain-of-thought (CoT) prompting to medical reasoning for the task of automatic diagnosis. Motivated by doctors' underlying reasoning process, we present Diagnostic-Reasoning CoT (DR-CoT). Empirical results demonstrate that by simply prompting large language models trained only on general text corpus with two DR-CoT exemplars, the diagnostic accuracy improves by 15% comparing to standard prompting. Moreover, the gap reaches a pronounced 18% in out-domain settings. Our findings suggest expert-knowledge reasoning in large language models can be elicited through proper promptings.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the diagnostic reasoning problem in Automatic Diagnosis (AD). Specifically, the authors propose a new method called the Diagnostic-Reasoning Chain of Thought (DR-CoT) to improve the performance of large language models in automatic diagnosis tasks. Through DR-CoT prompts, the model can better collect clinical evidence during conversations with patients and form a Differential Diagnosis (DDx) list, thereby enhancing the accuracy of the final diagnosis. Experimental results show that applying DR-CoT prompts to large language models trained only on general text corpora improves diagnostic accuracy by 15% compared to standard prompts, with an improvement of up to 18% on out-of-domain test sets. Additionally, the paper introduces a novel language model role-playing evaluation framework that simulates actual doctor-patient conversations to validate the model's performance. This work is the first to apply large language models to automatic diagnosis systems and demonstrates the potential to unlock the model's latent expert knowledge reasoning capabilities.