Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients
Siyin Guo,Ruicen Li,Genpeng Li,Wenjie Chen,Jing Huang,Linye He,Yu Ma,Liying Wang,Hongping Zheng,Chunxiang Tian,Yatong Zhao,Xinmin Pan,Hongxing Wan,Dasheng Liu,Zhihui Li,Jianyong Lei
DOI: https://doi.org/10.1210/clinem/dgae235
2024-04-10
The Journal of Clinical Endocrinology & Metabolism
Abstract:Abstract Context For some common thyroid-related conditions with high prevalence and long follow-up times, ChatGPT can be used to respond to common thyroid-related questions. Objective In this cross-sectional study, we assessed the ability of ChatGPT (version GPT-4.0) to provide accurate, comprehensive, compassionate, and satisfactory responses to common thyroid-related questions. Methods First, we obtained 28 thyroid-related questions from the Huayitong app, which together with the 2 interfering questions eventually formed 30 questions. Then, these questions were responded to by ChatGPT (on July 19, 2023), a junior specialist, and a senior specialist (on July 20, 2023) separately. Finally, 26 patients and 11 thyroid surgeons evaluated those responses on 4 dimensions: accuracy, comprehensiveness, compassion, and satisfaction. Results Among the 30 questions and responses, ChatGPT's speed of response was faster than that of the junior specialist (8.69 [7.53-9.48] vs 4.33 [4.05-4.60]; P < .001) and the senior specialist (8.69 [7.53-9.48] vs 4.22 [3.36-4.76]; P < .001). The word count of the ChatGPT's responses was greater than that of both the junior specialist (341.50 [301.00-384.25] vs 74.50 [51.75-84.75]; P < .001) and senior specialist (341.50 [301.00-384.25] vs 104.00 [63.75-177.75]; P < .001). ChatGPT received higher scores than the junior specialist and senior specialist in terms of accuracy, comprehensiveness, compassion, and satisfaction in responding to common thyroid-related questions. Conclusion ChatGPT performed better than a junior specialist and senior specialist in answering common thyroid-related questions, but further research is needed to validate the logical ability of the ChatGPT for complex thyroid questions.
endocrinology & metabolism