Abstract:Background: The transformative potential of artificial intelligence (AI) in higher education is evident, with conversational models like ChatGPT poised to reshape teaching and assessment methods. The rapid evolution of AI models requires a continuous evaluation. AI-based models can offer personalized learning experiences but raises accuracy concerns. MCQs are widely used for competency assessment. The aim of this study was to evaluate ChatGPT performance in medical microbiology MCQs compared to the students' performance. Methods: The study employed an 80-MCQ dataset from a 2021 medical microbiology exam at the University of Jordan Doctor of Dental Surgery (DDS) Medical Microbiology 2 course. The exam contained 40 midterm and 40 final MCQs, authored by a single instructor without copyright issues. The MCQs were categorized based on the revised Bloom's Taxonomy into four categories: Remember, Understand, Analyze, or Evaluate. Metrics, including facility index and discriminative efficiency, were derived from 153 midterm and 154 final exam DDS student performances. ChatGPT 3.5 was used to answer questions, and responses were assessed for correctness and clarity by two independent raters. Results: ChatGPT 3.5 correctly answered 64 out of 80 medical microbiology MCQs (80%) but scored below the student average (80.5/100 vs. 86.21/100). Incorrect ChatGPT responses were more common in MCQs with longer choices ( p = 0.025). ChatGPT 3.5 performance varied across cognitive domains: Remember (88.5% correct), Understand (82.4% correct), Analyze (75% correct), Evaluate (72% correct), with no statistically significant differences ( p = 0.492). Correct ChatGPT responses received statistically significant higher average clarity and correctness scores compared to incorrect responses. Conclusion: The study findings emphasized the need for ongoing refinement and evaluation of ChatGPT performance. ChatGPT 3.5 showed the potential to correctly and clearly answer medical microbiology MCQs; nevertheless, its performance was below-bar compared to the students. Variability in ChatGPT performance in different cognitive domains should be considered in future studies. The study insights could contribute to the ongoing evaluation of the AI-based models' role in educational assessment and to augment the traditional methods in higher education.

ChatGPT 3.5 fails to write appropriate multiple choice practice exam questions

ChatGPT Needs a Chemistry Tutor Too

Analyzing ChatGPT's Aptitude in an Introductory Computer Engineering Course

Investigating the Use of an Artificial Intelligence Chatbot with General Chemistry Exam Questions

Examining the Validity and Reliability of ChatGPT 3.5-Generated Reading Comprehension Questions for Academic Texts

Below average ChatGPT performance in medical microbiology exam compared to university students

The conversational AI "ChatGPT" outperforms medical students on a physiology university examination

Is ChatGPT 'ready' to be a learning tool for medical undergraduates and will it perform equally in different subjects? Comparative study of ChatGPT performance in tutorial and case-based learning questions in physiology and biochemistry

Befriending ChatGPT and other superchatbots: An AI-integrated take-home assessment preserving integrity

ChatGPT versus engineering education assessment: a multidisciplinary and multi-institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity

Benefits and Risks of Using ChatGPT4 as a Teaching Assistant for Computer Science Students

ChatGPT Participates in a Computer Science Exam

ChatGPT is a Breakthrough in Science and Education but Fails a Test in Sports and Exercise Psychology

Students' Experiences of Using ChatGPT in an Undergraduate Programming Course

Evaluation of ChatGPT's performance in Medical Education: A Comparative Analysis with Students in a Pulmonology Examination

ChatGPT in the classroom. Exploring its potential and limitations in a Functional Programming course

ChatGPT as a teaching tool: Preparing pathology residents for board examination with AI-generated digestive system pathology tests

ChatGPT is not a pocket calculator -- Problems of AI-chatbots for teaching Geography

ChatGPT & Mechanical Engineering: Examining performance on the FE Mechanical Engineering and Undergraduate Exams

Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4

ChatGPT Goes to Law School