Abstract:Abstract Purpose Chat Generative Pre-trained Transformer (ChatGPT) is a large language artificial intelligence (AI) model which generates contextually relevant text in response to questioning. After ChatGPT successfully passed the United States Medical Licensing Examinations, proponents have argued it should play an increasing role in medical service provision and education. AI in healthcare remains in its infancy, and the reliability of AI systems must be scrutinized. This study assessed whether ChatGPT could pass Section 1 of the Fellowship of the Royal College of Surgeons (FRCS) examination in Trauma and Orthopaedic Surgery. Methods The UK and Ireland In-Training Examination (UKITE) was used as a surrogate for the FRCS. Papers 1 and 2 of UKITE 2022 were directly inputted into ChatGPT. All questions were in a single-best-answer format without wording alterations. Imaging was trialled to ensure ChatGPT utilized this information. Results ChatGPT scored 35.8%: 30% lower than the FRCS pass rate and 8.2% lower than the mean score achieved by human candidates of all training levels. Subspecialty analysis demonstrated ChatGPT scored highest in basic science (53.3%) and lowest in trauma (0%). In 87 questions answered incorrectly, ChatGPT only stated it did not know the answer once and gave incorrect explanatory answers for the remaining questions. Conclusion ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the FRCS examination. Further, the current model fails to recognize its own limitations. ChatGPT’s deficiencies should be publicized equally as much as its successes to ensure clinicians remain aware of its fallibility. Key messages What is already known on this topic Following ChatGPT’s much-publicized success in passing the United States Medical Licensing Examinations, clinicians and medical students are using the model increasingly frequently for medical service provision and education. However ChatGPT remains in its infancy, and the model’s reliability and accuracy remain unproven. What this study adds This study demonstrates ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the Fellowship of the Royal College of Surgeons (FRCS) (Trauma & Orthopaedics) examination. Further, the current model fails to recognize its own limitations when offering both direct and explanatory answers. How this study might affect research, practice, or policy This study highlights the need for medical students and clinicians to exert caution when employing ChatGPT as a revision tool or applying it in clinical practice, and for patients to be aware of its fallibilities when using it as a health resource. Future research questions include:

Exploring the potential of Artificial Intelligence in Traumatology: Conversational answers to specific questions

Assessing the Efficacy of an AI-Powered Chatbot (ChatGPT) in Providing Information on Orthopedic Surgeries: A Comparative Study With Expert Opinion

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients

Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures

Artificial intelligence chatbot vs pathology faculty and residents: Real-world clinical questions from a genitourinary treatment planning conference

Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions

[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use]

Evaluating the Utility of ChatGPT in Diagnosing and Managing Maxillofacial Trauma

Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination?

ChatGPT encounters multiple opportunities and challenges in neurosurgery

ChatGPT can yield valuable responses in the context of orthopaedic trauma surgery

Reliability of artificial intelligence chatbot responses to frequently asked questions in breast surgical oncology

Can ChatGPT-4 Diagnose and Treat Like an Orthopaedic Surgeon? Testing Clinical Decision Making and Diagnostic Ability in Soft-Tissue Pathologies of the Foot and Ankle

An evaluation of ChatGPT compared with dermatological surgeons' choices of reconstruction for surgical defects after Mohs surgery

Comparing ChatGPT and a Single Anesthesiologist's Responses to Common Patient Questions: An Exploratory Cross-Sectional Survey of a Panel of Anesthesiologists

ChatGPT is capable of providing satisfactory responses to frequently asked questions regarding total shoulder arthroplasty

Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine

A Blinded Comparison of Three Generative Artificial Intelligence Chatbots for Orthopaedic Surgery Therapeutic Questions

Performance Assessment of an Artificial Intelligence Chatbot in Clinical Vitreoretinal Scenarios

Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence