Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study

Giacomo Rossettini,Lia Rodeghiero,Federica Corradi,Chad Cook,Paolo Pillastrini,Andrea Turolla,Greta Castellini,Stefania Chiappinotto,Silvia Gianola,Alvisa Palese
DOI: https://doi.org/10.1186/s12909-024-05630-9
IF: 3.263
2024-06-27
BMC Medical Education
Abstract:Artificial intelligence (AI) chatbots are emerging educational tools for students in healthcare science. However, assessing their accuracy is essential prior to adoption in educational settings. This study aimed to assess the accuracy of predicting the correct answers from three AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance standardized examination test of healthcare science degrees (CINECA test). Secondarily, we assessed the narrative coherence of the AI chatbots' responses (i.e., text output) based on three qualitative metrics: the logical rationale behind the chosen answer, the presence of information internal to the question, and presence of information external to the question.
education & educational research,education, scientific disciplines
What problem does this paper attempt to address?