Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma

Parul Ichhpujani,Uday Pratap Singh Parmar,Suresh Kumar
DOI: https://doi.org/10.22336/rjo.2024.45
Abstract:Aim: To evaluate the appropriateness and readability of the medical knowledge provided by ChatGPT-3.5 and Google Bard, artificial-intelligence-powered conversational search engines, regarding surgical treatment for glaucoma. Methods: In this retrospective, cross-sectional study, 25 common questions related to the surgical management of glaucoma were asked on ChatGPT-3.5 and Google Bard. Glaucoma specialists graded the responses' appropriateness, and different scores assessed readability. Results: Appropriate answers to the posed questions were obtained in 68% of the responses with Google Bard and 96% with ChatGPT-3.5. On average, the responses generated by Google Bard had a significantly lower proportion of sentences, having more than 30 and 20 syllables (23% and 52% respectively) compared to ChatGPT-3.5 (66% and 82% respectively), as noted by readability. Google Bard had significantly (p<0.0001) lower readability grade scores and significantly higher "Flesch Reading ease score", implying greater ease of readability amongst the answers generated by Google Bard. Discussion: Many patients and their families turn to LLM chatbots for information, necessitating clear and accurate content. Assessments of online glaucoma information have shown variability in quality and readability, with institutional websites generally performing better than private ones. We found that ChatGPT-3.5, while precise, has lower readability than Google Bard, which is more accessible but less precise. For example, the Flesch Reading Ease Score was 57.6 for Google Bard and 22.6 for ChatGPT, indicating Google Bard's content is easier to read. Moreover, the Gunning Fog Index scores suggested that Google Bard's text is more suitable for a broader audience. ChatGPT's knowledge is limited to data up to 2021, whereas Google Bard, trained with real-time data, offers more current information. Further research is needed to evaluate these tools across various medical topics. Conclusion: The answers generated by ChatGPT-3.5™ AI are more accurate than the ones given by Google Bard. However, comprehension of ChatGPT-3.5™ answers may be difficult for the public with glaucoma. This study emphasized the importance of verifying the accuracy and clarity of online information that glaucoma patients rely on to make informed decisions about their ocular health. This is an exciting new area for patient education and health literacy.
What problem does this paper attempt to address?