A comparative analysis of AI-based chatbots: Assessing data quality in orthognathic surgery related patient information

Ebru Yurdakurban,Kübra Gülnur Topsakal,Gökhan Serhat Duran
DOI: https://doi.org/10.1016/j.jormas.2023.101757
Abstract:Introduction: The aim of the current study is to evaluate the quality, reliability, readability, and similarity of data provided by different AI-based chatbots in the field of orthognathic surgery. Materials and methods: Guidelines on orthognathic surgery were reviewed, and a list of questions for patients to ask chatbots was produced by two reasearchers. The questions were categorized into 'General Information and Procedure' and 'Results and Recovery', with 30 questions in each category. Five different scoring criteria were used to evaluate the chatbot responses to 60 questions: Ensuring Quality Information for Patients (EQIP) tool, Reliability Scoring System (adapted from DISCERN), Global Quality Scale (GQS), Simple Measure of Gobbledygook (SMOG) and Similarity Index. Results: The highest mean values were observed in OpenEvidence for EQIP tool, SMOG, and Similarity Index, while for Reliability and GQS assessment criteria, MediSearch showed the highest values. When evaluated in terms of reliability and quality, all three AI-based chatbots demonstrated high reliability and good quality; however, they required at least a college-level education for readability based on the SMOG index. Additionally, upon assessment of similarity, ChatGPT-4 displayed high originality, while OpenEvidence exhibited a high degree of similarity. Conclusion: AI-based chatbots with a variety of features have usually provided answers with high quality, reliability, and difficult readability to questions. Although the medical information in the field of orthognathic surgery provided through chatbots is of higher quality, it is recommended that individuals consult their healthcare professionals on this issue.
What problem does this paper attempt to address?