Evaluation of ChatGPT's performance in Medical Education: A Comparative Analysis with Students in a Pulmonology Examination

Cherif,H.,Moussa,C.,Ben Rjab,S.,Mokaddem,S.,Dhahri,B.
DOI: https://doi.org/10.1183/13993003.congress-2024.pa4378
IF: 24.3
2024-11-01
European Respiratory Journal
Abstract:The rapid evolution of ChatGPT has raised concerns in the field of medical education. Objective: To evaluate the performance of ChatGPT in a pneumology examination. Methodology: We conducted a cross-sectional comparative study involving two distinct groups. The 1st group comprised 244 third-year medical students who had taken the pneumology examination in 2020. The 2nd group included two variants of ChatGPT-3.5: ChatGPT-V1 (lacking contextualization) and ChatGPT-V2 (enhanced with contextual information). The examination consisted of a total of 9 multiple-choice questions (MCQs), 13 short open-ended questions (SOEQs), and 7 clinical cases. ChatGPT's responses to each question were compared to those of the students and examined based on the difficulty index (DI). A DI below 0.4 qualifies a question as difficult, while a DI above 0.6 qualifies a question as easy. Results: ChatGPT-V1 demonstrated remarkable proficiency in radiology, microbiology, and thoracic surgery, surpassing the majority of medical students in these areas. However, it encountered challenges in anatomopathology, pharmacology, and clinical pneumology. Conversely, ChatGPT-V2 consistently provided more accurate responses across different question categories. ChatGPT exhibited suboptimal performance in MCQs compared to students. ChatGPT-V2 excelled in responding to SOEQs. Both versions, particularly ChatGPT-V2, outperformed students in handling questions with low and moderate DI. Additionally, students demonstrated increased proficiency when faced with highly DI questions. ChatGPT-V1 did not pass the examination; however, ChatGPT-V2 successfully passed, surpassing 62.1% of human candidates. Conclusion:Despite ChatGPT's access to online data, its performance closely mirrors that of an average-level medical student.
respiratory system
What problem does this paper attempt to address?