Quality of Chatbot Information Related to Benign Prostatic Hyperplasia

Christopher J. Warren,Nicolette G. Payne,Victoria S. Edmonds,Sandeep S. Voleti,Mouneeb M. Choudry,Nahid Punjani,Haider M. Abdul‐Muhsin,Mitchell R. Humphreys
DOI: https://doi.org/10.1002/pros.24814
2024-11-09
The Prostate
Abstract:Background Large language model (LLM) chatbots, a form of artificial intelligence (AI) that excels at prompt‐based interactions and mimics human conversation, have emerged as a tool for providing patients with information about urologic conditions. We aimed to examine the quality of information related to benign prostatic hyperplasia surgery from four chatbots and how they would respond to sample patient messages. Methods We identified the top three queries in Google Trends related to "treatment for enlarged prostate." These were entered into ChatGPT (OpenAI), Bard (Google), Bing AI (Microsoft), and Doximity GPT (Doximity), both unprompted and prompted for specific criteria (optimized). The chatbot‐provided answers to each query were evaluated for overall quality by three urologists using the DISCERN instrument. Readability was measured with the built‐in Flesch–Kincaid reading level tool in Microsoft Word. To assess the ability of chatbots to answer patient questions, we prompted the chatbots with a clinical scenario related to holmium laser enucleation of the prostate, followed by 10 questions that the National Institutes of Health recommends patients ask before surgery. Accuracy and completeness of responses were graded with Likert scales. Results Without prompting, the quality of information was moderate across all chatbots but improved significantly with prompting (mean [SD], 3.3 [1.2] vs. 4.4 [0.7] out of 5; p
endocrinology & metabolism,urology & nephrology
What problem does this paper attempt to address?