Applying GPT-4 to the Plastic Surgery Inservice Training Examination

Rohun Gupta,John B Park,Isabel Herzog,Nahid Yosufi,Amelia Mangan,Peter K Firouzbakht,Brian A Mailey
DOI: https://doi.org/10.1016/j.bjps.2023.09.027
Abstract:Background: The recent introduction of Generative Pre-trained Transformer (GPT)-4 has demonstrated the potential to be a superior version of ChatGPT-3.5. According to many, GPT-4 is seen as a more reliable and creative version of GPT-3.5. Objective: In conjugation with our prior manuscript, we wanted to determine if GPT-4 could be exploited as an instrument for plastic surgery graduate medical education by evaluating its performance on the Plastic Surgery Inservice Training Examination (PSITE). Methods: Sample assessment questions from the 2022 PSITE were obtained from the American Council of Academic Plastic Surgeons website and manually inputted into GPT-4. Responses by GPT-4 were qualified using the properties of natural coherence. Incorrect answers were stratified into the consequent categories: informational, logical, or explicit fallacy. Results: From a total of 242 questions, GPT-4 provided correct answers for 187, resulting in a 77.3% accuracy rate. Logical reasoning was utilized in 95.0% of questions, internal information in 98.3%, and external information in 97.5%. Upon separating the questions based on incorrect and correct responses, a statistically significant difference was identified in GPT-4's application of logical reasoning. Conclusion: GPT-4 has shown to be more accurate and reliable for plastic surgery resident education when compared to GPT-3.5. Users should look to utilize the tool to enhance their educational curriculum. Those who adopt the use of such models may be better equipped to deliver high-quality care to their patients.
What problem does this paper attempt to address?