Are large language models a useful resource to address common patient concerns on hallux valgus? A readability analysis

William J Hlavinka,Tarun R Sontam,Anuj Gupta,Brett J Croen,Mohammed S Abdullah,Casey J Humbyrd
DOI: https://doi.org/10.1016/j.fas.2024.08.002
2024-08-06
Abstract:Background: This study evaluates the accuracy and readability of Google, ChatGPT-3.5, and 4.0 (two versions of an artificial intelligence model) responses to common questions regarding bunion surgery. Methods: A Google search of "bunionectomy" was performed, and the first ten questions under "People Also Ask" were recorded. ChatGPT-3.5 and 4.0 were asked these ten questions individually, and their answers were analyzed using the Flesch-Kincaid Reading Ease and Gunning-Fog Level algorithms. Results: When compared to Google, ChatGPT-3.5 and 4.0 had a larger word count with 315 ± 39 words (p < .0001) and 294 ± 39 words (p < .0001), respectively. A significant difference was found between ChatGPT-3.5 and 4.0 compared to Google using Flesch-Kincaid Reading Ease (p < .0001). Conclusions: Our findings demonstrate that ChatGPT provided significantly lengthier responses than Google and there was a significant difference in reading ease. Both platforms exceeded the seventh to eighth-grade reading level of the U.S. Level of evidence: N/A.
What problem does this paper attempt to address?