Assessing the Accuracy of AI Models in Orthodontic Knowledge: A Comparative Study Between ChatGPT-4 and Google Bard

Sadia Naureen,Huma Ghazanfar Kiani
DOI: https://doi.org/10.29271/jcpsp.2024.07.761
Abstract:Objective: To compare the knowledge accuracy of ChatGPT-4 and Google Bard in response to knowledge-based questions related to orthodontic diagnosis and treatment modalities. Study design: Cross-sectional comparative study. Place and Duration of the Study: Department of Orthodontics, Rawal Institute of Health Sciences, Islamabad, Pakistan, from June 23rd to August 30th 2023. Methodology: A comprehensive content analysis was designed based on a mini implant-assisted rapid palatal expansion (MARPE), clear aligners (CA), and cone beam computed tomography (CBCT), involving 30 questions for each category (total = 90) derived from recent review articles. Questions were prepared and presented to two large language models (LLMs): Google Bard and ChatGPT-4. Two independent raters evaluated the accuracy of the responses using a scoring system ranging from one to five, by comparing the answers to a standard key. Statistical analyses, including the paired sample t-test, were used to assess the performance of the two language models. Results: GPT-4 demonstrated superior performance, outperforming Google Bard significantly in the MARPE, CBCT, and CA categories, and achieved a higher mean score. A p-value was found to be (p = 0.001) for MARPE and CBCT, while it was (p = 0.013) for CA. Overall, GPT-4 achieved a total score of 92.6%, surpassing Google Bard's which was 72%. Conclusion: GPT-4 is more efficient than Google Bard in providing accurate and up-to-date information regarding recent trends in orthodontic treatment modalities. Key words: Aligners, Cone beam computed tomography, ChatGPT-4, Google Bard, Mini implant-assisted rapid palatal expansion.
What problem does this paper attempt to address?