Evaluating Large Language Model‐Assisted Emergency Triage: A Comparison of Acuity Assessments by GPT‐4 and Medical Experts

Gal Ben Haim,Mor Saban,Yiftach Barash,David Cirulnik,Amit Shaham,Ben Zion Eisenman,Livnat Burshtein,Orly Mymon,Eyal Klang
DOI: https://doi.org/10.1111/jocn.17490
2024-11-30
Journal of Clinical Nursing
Abstract:Aim To evaluate the accuracy of the Emergency Severity Index (ESI) assignments by GPT‐4, a large language model (LLM), compared to senior emergency department (ED) nurses and physicians. Method An observational study of 100 consecutive adult ED patients was conducted. ESI scores assigned by GPT‐4, triage nurses, and by a senior clinician. Both model and human experts were provided the same patient data. Results GPT‐4 assigned a lower median ESI score (2.0) compared to human evaluators (median 3.0; p
nursing
What problem does this paper attempt to address?