Abstract:Background: Symptom checker apps are patient-facing decision support systems aimed at providing advice to laypersons on whether, where, and how to seek health care (disposition advice). Such advice can improve laypersons' self-assessment and ultimately improve medical outcomes. Past research has mainly focused on the accuracy of symptom checker apps' suggestions. To support decision-making, such apps need to provide not only accurate but also trustworthy advice. To date, only few studies have addressed the question of the extent to which laypersons trust symptom checker app advice or the factors that moderate their trust. Studies on general decision support systems have shown that framing automated systems (anthropomorphic or emphasizing expertise), for example, by using icons symbolizing artificial intelligence (AI), affects users' trust. Objective: This study aims to identify the factors influencing laypersons' trust in the advice provided by symptom checker apps. Primarily, we investigated whether designs using anthropomorphic framing or framing the app as an AI increases users' trust compared with no such framing. Methods: Through a web-based survey, we recruited 494 US residents with no professional medical training. The participants had to first appraise the urgency of a fictitious patient description (case vignette). Subsequently, a decision aid (mock symptom checker app) provided disposition advice contradicting the participants' appraisal, and they had to subsequently reappraise the vignette. Participants were randomized into 3 groups: 2 experimental groups using visual framing (anthropomorphic, 160/494, 32.4%, vs AI, 161/494, 32.6%) and a neutral group without such framing (173/494, 35%). Results: Most participants (384/494, 77.7%) followed the decision aid's advice, regardless of its urgency level. Neither anthropomorphic framing (odds ratio 1.120, 95% CI 0.664-1.897) nor framing as AI (odds ratio 0.942, 95% CI 0.565-1.570) increased behavioral or subjective trust (P=.99) compared with the no-frame condition. Even participants who were extremely certain in their own decisions (ie, 100% certain) commonly changed it in favor of the symptom checker's advice (19/34, 56%). Propensity to trust and eHealth literacy were associated with increased subjective trust in the symptom checker (propensity to trust b=0.25; eHealth literacy b=0.2), whereas sociodemographic variables showed no such link with either subjective or behavioral trust. Conclusions: Contrary to our expectation, neither the anthropomorphic framing nor the emphasis on AI increased trust in symptom checker advice compared with that of a neutral control condition. However, independent of the interface, most participants trusted the mock app's advice, even when they were very certain of their own assessment. Thus, the question arises as to whether laypersons use such symptom checkers as substitutes rather than as aids in their own decision-making. With trust in symptom checkers already high at baseline, the benefit of symptom checkers depends on interface designs that enable users to adequately calibrate their trust levels during usage. Trial registration: Deutsches Register Klinischer Studien DRKS00028561; https://tinyurl.com/rv4utcfb (retrospectively registered).

Evaluating self-triage accuracy of laypeople, symptom-assessment apps, and large language models: A framework for case vignette development using a representative design approach (RepVig)

Accuracy of Online Symptom-Assessment Applications, Large Language Models, and Laypeople for Self-Triage Decisions: A Systematic Review

Technology-Supported Self-Triage Decision Making: A Mixed-Methods Study

Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study

Enhancing diagnostic accuracy in symptom-based health checkers: a comprehensive machine learning approach with clinical vignettes and benchmarking

A Symptom-Checker for Adult Patients Visiting an Interdisciplinary Emergency Care Center and the Safety of Patient Self-Triage: Real-Life Prospective Evaluation

Vignette studies of medical choice and judgement to study caregivers' medical decision behaviour: systematic review

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study

Determinants of Laypersons' Trust in Medical Decision Aids: Randomized Controlled Trial

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

A scoping review on the use and usefulness of online symptom checkers and triage systems: How to proceed?

TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations

A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review

Evaluating large language model workflows in clinical decision support: referral, triage, and diagnosis

Evaluating Suicide Risk Stratification in Outpatient Settings: A Vignette‐Based Approach

A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis

All You Need Is Context: Clinician Evaluations of various iterations of a Large Language Model-Based First Aid Decision Support Tool in Ghana

Can Large Language Models Provide Emergency Medical Help Where There Is No Ambulance? A Comparative Study on Large Language Model Understanding of Emergency Medical Scenarios in Resource-Constrained Settings

Bias patterns in the application of LLMs for clinical decision support: A comprehensive study

Real-World Performance of Large Language Models in Emergency Department Chest Pain Triage