Prompt engineering with ChatGPT3.5 and GPT4 to improve patient education on retinal diseases

Hoyoung Jung,Jean Oh,Kirk A J Stephenson,Aaron W Joe,Zaid N Mammo
DOI: https://doi.org/10.1016/j.jcjo.2024.08.010
2024-09-05
Abstract:Objective: To assess the effect of prompt engineering on the accuracy, comprehensiveness, readability, and empathy of large language model (LLM)-generated responses to patient questions regarding retinal disease. Design: Prospective qualitative study. Participants: Retina specialists, ChatGPT3.5, and GPT4. Methods: Twenty common patient questions regarding 5 retinal conditions were inputted to ChatGPT3.5 and GPT4 as a stand-alone question or preceded by an optimized prompt (prompt A) or preceded by prompt A with specified limits to length and grade reading level (prompt B). Accuracy and comprehensiveness were graded by 3 retina specialists on a Likert scale from 1 to 5 (1: very poor to 5: very good). Readability of responses was assessed using Readable.com, an online readability tool. Results: There were no significant differences between ChatGPT3.5 and GPT4 across any of the metrics tested. Median accuracy of responses to a stand-alone question, prompt A, and prompt B questions were 5.0, 5.0, and 4.0, respectively. Median comprehensiveness of responses to a stand-alone question, prompt A, and prompt B questions were 5.0, 5.0, and 4.0, respectively. The use of prompt B was associated with a lower accuracy and comprehensiveness than responses to stand-alone question or prompt A questions (p < 0.001). Average-grade reading level of responses across both LLMs were 13.45, 11.5, and 10.3 for a stand-alone question, prompt A, and prompt B questions, respectively (p < 0.001). Conclusions: Prompt engineering can significantly improve readability of LLM-generated responses, although at the cost of reducing accuracy and comprehensiveness. Further study is needed to understand the utility and bioethical implications of LLMs as a patient educational resource.
What problem does this paper attempt to address?