Evaluating self-triage accuracy of laypeople, symptom-assessment apps, and large language models: A framework for case vignette development using a representative design approach (RepVig)

Marvin Kopka,Hendrik Napierala,Martin Privoznik,Desislava Sapunova,Sizhuo Zhang,Markus A. Feufel
DOI: https://doi.org/10.1101/2024.04.02.24305193
2024-04-03
Abstract:Most studies evaluating symptom-assessment applications (SAAs) rely on a common set of case vignettes that are authored by clinicians and devoid of context, which may be representative of clinical settings but not of situations where patients use SAAs. Assuming the use case of self-triage, we used representative design principles to sample case vignettes from online platforms where patients describe their symptoms to obtain professional advice and compared triage performance of laypeople, SAAs, and Large Language Models (LLMs) on representative versus standard vignettes. We found performance differences in all three groups depending on vignette type (OR = 1.27 to 3.41, p < .001 to .035) and changed rankings of best-performing SAAs and LLMs. Based on these results, we argue that our representative vignette sampling approach (that we call the RepVig Framework) should replace the practice of using a fixed vignette set as standard for SAA evaluation studies.
Health Informatics
What problem does this paper attempt to address?