Accuracy Evaluation of GPT-Assisted Differential Diagnosis in Emergency Department

Fatemeh Shah-Mohammadi,Joseph Finkelstein
DOI: https://doi.org/10.3390/diagnostics14161779
2024-08-15
Abstract:In emergency department (ED) settings, rapid and precise diagnostic evaluations are critical to ensure better patient outcomes and efficient healthcare delivery. This study assesses the accuracy of differential diagnosis lists generated by the third-generation ChatGPT (ChatGPT-3.5) and the fourth-generation ChatGPT (ChatGPT-4) based on electronic health record notes recorded within the first 24 h of ED admission. These models process unstructured text to formulate a ranked list of potential diagnoses. The accuracy of these models was benchmarked against actual discharge diagnoses to evaluate their utility as diagnostic aids. Results indicated that both GPT-3.5 and GPT-4 reasonably accurately predicted diagnoses at the body system level, with GPT-4 slightly outperforming its predecessor. However, their performance at the more granular category level was inconsistent, often showing decreased precision. Notably, GPT-4 demonstrated improved accuracy in several critical categories that underscores its advanced capabilities in managing complex clinical scenarios.
What problem does this paper attempt to address?