Abstract:e13628 Background: ChatGPT is a conversational artificial intelligence (AI) model that learns from massive text-based datasets and then responds to user input, which often involves completing tasks or answering questions. Recent studies showed ChatGPT’s success in passing multiple specialty medical licensing and board examinations, showcasing its promising capabilities in the medical domain. Here, we investigated ChatGPT's potential as a swift and reliable information source for medical oncologists using board examination style questions and real patient cases. Methods: We randomly selected 121 board-style questions from the American Society of Clinical Oncology Self-Evaluation Program (ASCO SEP). The questions were entered into ChatGPT in both multiple-choice (MC) and open-ended (OE) prompts. ChatGPT’s answers and explanations were evaluated for accuracy and concordance. Non-inferiority analysis was performed with power of 80% at α = 0.05 and non-inferiority margin set at 70% correct answers given the historical board exam pass rate of about 65% correct answers. For subgroup analysis, the questions were categorized by tested competency and primary tumor pathology. ChatGPT was also given 10 questions derived from real patient cases. We compared its responses to the answers provided by experienced oncologists to determine accuracy and practical applicability. Results: ChatGPT answered 75 (62.0%) MC queries correctly. Among the correctly answered queries, 2 responses contained faulty explanations. Such inaccurate or discordant explanations were found in 26 of the 46 incorrectly answered queries. In OE prompts, ChatGPT answered 53 (43.8%) questions correctly with correct explanations for all. Of the 68 incorrect responses, 32 of them contained inaccurate or discordant explanations. Subgroup analysis suggested varying performance across the categories. The best performance was seen with malignant hematology (81.8% of MC and 72.8% of OE prompts answered correctly) while the weakest performance was seen with genitourinary malignancies (60% of MC and 20% of OE prompts answered correctly). As for the real-world patient case questions, responses from ChatGPT and the clinicians were concordant in 5 questions. None of the discordant responses contained inaccurate information while 80% of the concordant responses contained sufficient details to assist with patient management decisions. Conclusions: ChatGPT's performance fell short of the non-inferiority margin, highlighting the challenges with incorporating AI in the rapidly evolving field of medical oncology. Despite the limitations, ChatGPT’s partial success, in both board-style and real-world patient care questions, affirms its potential for clinical utility in future.

Comparative analysis of ChatGPT and Bard in answering pathology examination questions requiring image interpretation

Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment

Evaluation of ChatGPT pathology knowledge using board-style questions

ChatGPT as a teaching tool: Preparing pathology residents for board examination with AI-generated digestive system pathology tests

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions

Unveiling the risks of ChatGPT in diagnostic surgical pathology

Unveiling the risks of ChatGPT in diagnostic surgical pathologyChatGPT

Evaluation of ChatGPT’s Usefulness and Accuracy in Diagnostic Surgical Pathology

Assessment of Pathology Domain-Specific Knowledge of ChatGPT and Comparison to Human Performance

Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology

A Comparative Analysis of ChatGPT and Google’s AI’s “Bard” in Medicine

Evaluating the Performance of ChatGPT-4o Vision Capabilities on Image-Based USMLE Step 1, Step 2, and Step 3 Examination Questions

Navigating the path to precision: ChatGPT as a tool in pathology

Thinking like a pathologist: Morphologic approach to hepatobiliary tumors by ChatGPT

To Compare the Efficiency of ChatGPT and Bard in Medical Education: An Analysis of MCQ-Based Learning and Assessment

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5, and Humans in Clinical Chemistry Multiple-Choice Questions

Precision of artificial intelligence in paediatric cardiology multimodal image interpretation

Exploring the Feasibility of Multimodal Chatbot AI as Copilot in Pathology Diagnostics: Generalist Model's Pitfall

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians

Assessing ChatGPT's potential as a clinical resource for medical oncologists: An evaluation with board-style questions and real-world patient cases.