Evaluating the Current Ability of ChatGPT to Assist in Professional Otolaryngology Education

Habib G Zalzal,Jenhao Cheng,Rahul K Shah
DOI: https://doi.org/10.1002/oto2.94
2023-11-22
OTO Open
Abstract:Objective: To quantify ChatGPT's concordance with expert Otolaryngologists when posed with high-level questions that require blending rote memorization and critical thinking. Study design: Cross-sectional survey. Setting: OpenAI's ChatGPT-3.5 Platform. Methods: Two board-certified otolaryngologists (HZ, RS) input 2 sets of 30 text-based questions (open-ended and single-answer multiple-choice) into the ChatGPT-3.5 model. Responses were rated on a scale (correct, partially correct, incorrect) by each Otolaryngologist working simultaneously with the AI model. Interrater agreement percentage was based on binomial distribution for calculating the 95% confidence intervals and performing significance tests. Statistical significance was defined as P < .05 for 2-sided tests. Results: In testing open-ended questions, the ChatGPT model had 56.7% of initially answering questions with complete accuracy, and 86.7% chance of answer with some accuracy (corrected agreement = 80.1%; P < .001). For repeat questions, ChatGPT improved to 73.3% with complete accuracy and 96.7% with some accuracy (corrected agreement = 88.8%; P < .001). For multiple-choice questions, the ChatGPT model performed substantially worse (43.3% correct). Conclusion: ChatGPT currently does not provide reliably accurate responses to sophisticated questions in Otolaryngology. Professional societies must be aware of the potential of this tool and prevent unscrupulous use during test-taking situations and consider guidelines for clinical scenarios. Expert clinical oversight is still necessary for myriad use cases (eg, hallucination).
What problem does this paper attempt to address?