Abstract:e13654 Background: The integration of Natural Language Processing (NLP) into healthcare holds tremendous promise. ChatGPT-4.0 (OpenAI, San Francisco, CA) is a widely recognized large language model that can comprehend and generate text, answer questions, and perform other language-related tasks. However, pitfalls and errors have been described in its clinical application. We explored the ability of ChatGPT-4.0 (ChatGPT) to guide clinical decision-making in 6 gastrointestinal cancers using the National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines as a framework. Methods: We reviewed the NCCN Guidelines for Anal Squamous Cell Carcinoma (AN), Small Bowel Adenocarcinoma (SB), Ampullary Adenocarcinoma (AA), Biliary Tract Cancers (BT), Pancreatic Adenocarcinoma (PN), and Gastric Cancer (GA). Up to 2 clinical questions were designed for each decision-making page. Questions were categorized by type ( Wo rkup, Treatment, Surveillance, Diagnostics, or References). ChatGPT was queried in a reproducible fashion. To account for variable prompt engineering of our non-validated assessment tool, up to 3 follow-up questions were allowed. Responses were rated on a Likert scale: 5) Correct; 4) Correct, with missing information requiring clarification; 3) Correct, but unable to complete answer; 2) Partially incorrect; 1) Absolutely incorrect. Subgroup analysis was conducted on Correctness (defined as scores 1-2 vs 3-5) and Accuracy (scores 1-3 vs 4-5). Variance between ChatGPT responses to each cancer was analyzed. Descriptive statistics were used, and significance was tested with binary logistic regression. Results: A total of 270 questions were generated (range per cancer 32-68). The score frequency distribution was: 5) 45.2%; 4) 19.3%; 3) 13.3%; 2) 13.7%; and 1) 8.5%. On subgroup analysis, Correctness was seen in 210 (77.8%) of questions, and Accuracy with 174 (64.4%). The difference in Correctness scores between cancers was not statistically significant, and there was no statistically significant difference in scores by question type. There was a statistically significant difference in the Accuracy of ChatGPT between cancers (Table). Conclusions: ChatGPT was significantly more likely to provide accurate responses to questions regarding GA and PN versus AN or SB. It demonstrates a limited capacity to assist with complex clinical decision-making in 6 gastrointestinal cancers. However, the Accuracy level is below the acceptable threshold for implementation into clinical use. Further analysis of the expanding capabilities of ChatGPT and other NLP-based tools is warranted in this rapidly evolving domain. Future studies would benefit from a validating grading instrument. [Table: see text]

Assessment of ChatGPT's adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: in binary and multiple-choice queries

Assessment of ChatGPT's Compliance with ESC-Acute Coronary Syndrome Management Guidelines at 30-Day Intervals

ChatGPT v4 outperforming v3.5 on cancer treatment recommendations in quality, clinical guideline, and expert opinion concordance

Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma

Evaluating the Success of ChatGPT in Addressing Patient Questions Concerning Thyroid Surgery

Artificial intelligence performance in clinical neurology queries: the ChatGPT model

Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions

S2148 Assessment of ChatGPT-4's Efficacy in Providing Guideline-Directed Assistance to Residents and Fellows for Critical GI Conditions in Acute Settings

Effectiveness of ChatGPT in clinical pharmacy and the role of artificial intelligence in medication therapy management

Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system

Evaluation of ChatGPT-4's Performance in Therapeutic Decision-Making During Multidisciplinary Oncology Meetings for Head and Neck Squamous Cell Carcinoma

Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine

Diagnostic and Management Performance of ChatGPT in Obstetrics and Gynecology

Comparative analysis of ChatGPT-4.0's management of six gastrointestinal cancers according to the NCCN guidelines.

ChatGPT's Efficacy in Queries Regarding Polycystic Ovary Syndrome and Treatment Strategies for Women Experiencing Infertility

Evaluation of the prediagnosis and management of ChatGPT-4.0 in clinical cases in cardiology

Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model

Evaluation of the impact Of ChatGPT support on acromegaly management and patient education

Current applications and future potential of ChatGPT in radiology: A systematic review