Abstract:Abstract Purpose Chat Generative Pre-trained Transformer (ChatGPT) is a large language artificial intelligence (AI) model which generates contextually relevant text in response to questioning. After ChatGPT successfully passed the United States Medical Licensing Examinations, proponents have argued it should play an increasing role in medical service provision and education. AI in healthcare remains in its infancy, and the reliability of AI systems must be scrutinized. This study assessed whether ChatGPT could pass Section 1 of the Fellowship of the Royal College of Surgeons (FRCS) examination in Trauma and Orthopaedic Surgery. Methods The UK and Ireland In-Training Examination (UKITE) was used as a surrogate for the FRCS. Papers 1 and 2 of UKITE 2022 were directly inputted into ChatGPT. All questions were in a single-best-answer format without wording alterations. Imaging was trialled to ensure ChatGPT utilized this information. Results ChatGPT scored 35.8%: 30% lower than the FRCS pass rate and 8.2% lower than the mean score achieved by human candidates of all training levels. Subspecialty analysis demonstrated ChatGPT scored highest in basic science (53.3%) and lowest in trauma (0%). In 87 questions answered incorrectly, ChatGPT only stated it did not know the answer once and gave incorrect explanatory answers for the remaining questions. Conclusion ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the FRCS examination. Further, the current model fails to recognize its own limitations. ChatGPT’s deficiencies should be publicized equally as much as its successes to ensure clinicians remain aware of its fallibility. Key messages What is already known on this topic Following ChatGPT’s much-publicized success in passing the United States Medical Licensing Examinations, clinicians and medical students are using the model increasingly frequently for medical service provision and education. However ChatGPT remains in its infancy, and the model’s reliability and accuracy remain unproven. What this study adds This study demonstrates ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the Fellowship of the Royal College of Surgeons (FRCS) (Trauma & Orthopaedics) examination. Further, the current model fails to recognize its own limitations when offering both direct and explanatory answers. How this study might affect research, practice, or policy This study highlights the need for medical students and clinicians to exert caution when employing ChatGPT as a revision tool or applying it in clinical practice, and for patients to be aware of its fallibilities when using it as a health resource. Future research questions include:

Educational Limitations of ChatGPT in Neurosurgery Board Preparation

Performance of ChatGPT on the MCAT: The Road to Personalized and Equitable Premedical Learning

Large Language Model-Based Neurosurgical Evaluation Matrix: A Novel Scoring Criteria to Assess the Efficacy of ChatGPT as an Educational Tool for Neurosurgery Board Preparation

A Quantitative Assessment of ChatGPT as a Neurosurgical Triaging Tool

Performance of ChatGPT on American Board of Surgery In-Training Examination Preparation Questions

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams

An Exploratory Analysis of ChatGPT Compared to Human Performance With the Anesthesiology Oral Board Examination: Initial Insights and Implications

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures

Is ChatGPT smarter than Otolaryngology trainees? A comparison study of board style exam questions

Is ChatGPT 3.5 smarter than Otolaryngology trainees? A comparison study of board style exam questions

Evaluating ChatGPT-4 in medical education: an assessment of subject exam performance reveals limitations in clinical curriculum support for students

Evaluating Performance of ChatGPT on MKSAP Cardiology Board Review Questions

Evaluation of ChatGPT pathology knowledge using board-style questions

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions

Evaluating the Current Ability of ChatGPT to Assist in Professional Otolaryngology Education

ChatGPT as a teaching tool: Preparing pathology residents for board examination with AI-generated digestive system pathology tests

Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination

Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination?

Capabilities and Limitations of ChatGPT in Anatomy Education: An Interaction With ChatGPT

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination

Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4