Abstract:Generative artificial intelligence (AI) has the potential to assist clinicians in responding to patients' messages. 1 Although AI-generated responses were found to have acceptable quality with minimal risks of harm, 2 -4 the perspectives of laypeople toward AI responses have rarely been investigated despite their importance. Thus, we assessed laypersons' satisfaction with the responses of AI vs clinicians-to-patient messages. Additionally, we examined if the clinician-determined quality of AI responses was concordant with satisfaction. In this cross-sectional study, out of 3 769 023 Patient Medical Advice Requests in electronic health records (EHRs), we screened 1089 clinical questions and included 59 messages (Figure). To mitigate possible selection bias, we developed and followed structured guidelines (eAppendix 1 in Supplement 1). Two generative AIs (ChatGPT-4, December 2023 version [OpenAI Inc] and Stanford Health Care and Stanford School of Medicine GPT, January 2024 version) created responses with and without prompt engineering (December 28, 2023, to January 31, 2024). Six licensed clinicians evaluated the AI and original clinician responses for information quality and empathy, using a 5-point Likert scale (with 1 indicating worst; 5, best). For satisfaction, 30 survey participants recruited through the Stanford Research Registry assessed the responses of AI (prompt-engineered Stanford GPT selected for highest quality AI responses) and clinicians (April 5 to June 10, 2024). Three individuals independently evaluated each response (with 1 being extremely dissatisfied; 5, extremely satisfied). 5 To account for potential biases and variability of evaluators, we developed mixed models to compute effect estimates with standard errors for information quality, empathy, and satisfaction. We examined the association of response length with satisfaction using a multivariable linear regression accounting for age, sex, race, and ethnicity, where statistical significance was at P < .05. Analyses were conducted with SAS version 9.4 (SAS Institute Inc). We followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline. The institutional review board at Stanford University approved this study. We obtained systematically deidentified patient messages using Safe Harbor, and 2 researchers additionally verified no protected health information was present. A total of 2118 assessments for AI response quality and 408 assessments for satisfaction were included (Figure). Satisfaction estimates were higher with AI responses (3.96 [SE, 0.09]) than with clinicians' responses (3.05 [SE, 0.09]), both overall ( P < .001) and by specialty (Table). Satisfaction was highest with responses to cardiology questions from AI (4.09 [SE, 0.14]) while information quality and empathy were highest with responses to endocrinology questions. Clinicians' responses were shorter (mean [SD] 254.37 [198.85] total characters) than AI responses (mean [SD] 1470.77 [391.83] total characters). Clinicians' response length was associated with satisfaction overall (β = 0.23; P = .002) and in cardiovascular questions (β = 0.31; P = .02) while AI response length was not. To our knowledge, this is the first study to assess satisfaction with AI-generated responses to patient-created medical questions in EHR. Satisfaction was consistently higher with AI-generated responses than with clinicians overall and by specialty. However, satisfaction was not necessarily concordant with the clinician-determined information quality and empathy. For example, satisfaction was highest with AI responses to cardiology questions while information quality and empathy were highest in endocrinology questions. Interestingly, clinicians' response length was associated with satisfaction while AI's response length was not. The findings suggest that the extreme brevity of responses could be a factor that lowers satisfaction in patient-clinician communication in EHR. Study limitations include that satisfaction was assessed by survey participants rather than the patients who submitted the questions. Although original patients' satisfaction might differ from that of survey participants, this study can provide the closest proxy of patients' perspectives toward AI-generated responses. Future directions of the study include assessing satisfaction with AI responses in other settings (eg, regions and types of medical centers), study populations (eg, language and culture), and with larger samples across diverse specialties. Our study highlights the importance of incorporating patients as key stakeholders in the development and implementation of AI in patient-clinician communications to optimally integrate AI into practice. 6 Accepted for Publication: August 15, 2024. Published: October 16 -Abstract Truncated-

Doctor AI? A pilot study examining responses of artificial intelligence to common questions asked by geriatric patients

Perspectives on Artificial Intelligence–Generated Responses to Patient Messages

[Rare disease in the age of artificial intelligence]

Artificial intelligence chatbot vs pathology faculty and residents: Real-world clinical questions from a genitourinary treatment planning conference

The doc versus the bot: A pilot study to assess the quality and accuracy of physician and chatbot responses to clinical questions in gynecologic oncology

Is My Doctor Human? Acceptance of AI among Patients with Breast Cancer

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study

Patients' Perceptions Toward Human-Artificial Intelligence Interaction in Health Care: Experimental Study

The AI Will See You Now: Feasibility and Acceptability of a Conversational AI Medical Interviewing System

Acceptance of clinical artificial intelligence among physicians and medical students: A systematic review with cross-sectional survey

Public perceptions of artificial intelligence in healthcare: ethical concerns and opportunities for patient-centered care

Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study

Building Trust: Developing an Ethical Communication Framework for Navigating Artificial Intelligence Discussions and Addressing Potential Patient Concerns

Artificial Intelligence-Powered Surgical Consent: Patient Insights

Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model

Comparing ChatGPT and a Single Anesthesiologist's Responses to Common Patient Questions: An Exploratory Cross-Sectional Survey of a Panel of Anesthesiologists

The future of AI clinicians: assessing the modern standard of chatbots and their approach to diagnostic uncertainty

Geriatrics and artificial intelligence in Spain (Ger-IA project): talking to ChatGPT, a nationwide survey

Patient Perspectives on AI for Mental Health Care: Cross-Sectional Survey Study

Navigating the doctor-patient-AI relationship - a mixed-methods study of physician attitudes toward artificial intelligence in primary care

Artificial Intelligence in Primary Health Care: Perceptions, Issues, and Challenges