Abstract:e13654 Background: The integration of Natural Language Processing (NLP) into healthcare holds tremendous promise. ChatGPT-4.0 (OpenAI, San Francisco, CA) is a widely recognized large language model that can comprehend and generate text, answer questions, and perform other language-related tasks. However, pitfalls and errors have been described in its clinical application. We explored the ability of ChatGPT-4.0 (ChatGPT) to guide clinical decision-making in 6 gastrointestinal cancers using the National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines as a framework. Methods: We reviewed the NCCN Guidelines for Anal Squamous Cell Carcinoma (AN), Small Bowel Adenocarcinoma (SB), Ampullary Adenocarcinoma (AA), Biliary Tract Cancers (BT), Pancreatic Adenocarcinoma (PN), and Gastric Cancer (GA). Up to 2 clinical questions were designed for each decision-making page. Questions were categorized by type ( Wo rkup, Treatment, Surveillance, Diagnostics, or References). ChatGPT was queried in a reproducible fashion. To account for variable prompt engineering of our non-validated assessment tool, up to 3 follow-up questions were allowed. Responses were rated on a Likert scale: 5) Correct; 4) Correct, with missing information requiring clarification; 3) Correct, but unable to complete answer; 2) Partially incorrect; 1) Absolutely incorrect. Subgroup analysis was conducted on Correctness (defined as scores 1-2 vs 3-5) and Accuracy (scores 1-3 vs 4-5). Variance between ChatGPT responses to each cancer was analyzed. Descriptive statistics were used, and significance was tested with binary logistic regression. Results: A total of 270 questions were generated (range per cancer 32-68). The score frequency distribution was: 5) 45.2%; 4) 19.3%; 3) 13.3%; 2) 13.7%; and 1) 8.5%. On subgroup analysis, Correctness was seen in 210 (77.8%) of questions, and Accuracy with 174 (64.4%). The difference in Correctness scores between cancers was not statistically significant, and there was no statistically significant difference in scores by question type. There was a statistically significant difference in the Accuracy of ChatGPT between cancers (Table). Conclusions: ChatGPT was significantly more likely to provide accurate responses to questions regarding GA and PN versus AN or SB. It demonstrates a limited capacity to assist with complex clinical decision-making in 6 gastrointestinal cancers. However, the Accuracy level is below the acceptable threshold for implementation into clinical use. Further analysis of the expanding capabilities of ChatGPT and other NLP-based tools is warranted in this rapidly evolving domain. Future studies would benefit from a validating grading instrument. [Table: see text]

Evaluating ChatGPT text mining of clinical records for companion animal obesity monitoring

Evaluating ChatGPT text-mining of clinical records for obesity monitoring

Uncovering Language Disparity of ChatGPT in Healthcare: Non-English Clinical Environment for Retinal Vascular Disease Classification (Preprint)

Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification

A critical assessment of using ChatGPT for extracting structured data from clinical notes

ChatGPT in healthcare: A taxonomy and systematic review

Comparative analysis of ChatGPT-4.0's management of six gastrointestinal cancers according to the NCCN guidelines.

Large Language Models for Efficient Medical Information Extraction

ChatGPT in Veterinary Medicine: A Practical Guidance of Generative Artificial Intelligence in Clinics, Education, and Research

Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery

Explainable text-tabular models for predicting mortality risk in companion animals

Classification performance and reproducibility of GPT-4 omni for information extraction from veterinary electronic health records

Text mining for disease surveillance in veterinary clinical data: part two, training computers to identify features in clinical text

Applications of the Natural Language Processing Tool ChatGPT in Clinical Practice: Comparative Study and Augmented Systematic Review

Effectiveness of ChatGPT in explaining complex medical reports to patients

Assessing ChatGPT's capacity for clinical decision support in pediatrics: A comparative study with pediatricians using KIDMAP of Rasch analysis

Are Different Versions of ChatGPT's Ability Comparable to the Clinical Diagnosis Presented in Case Reports? A Descriptive Study

Using a gradient boosted model for case ascertainment from free-text veterinary records

P467 Towards AI-Augmented Clinical Decision Making: An Examination of ChatGPT's Utility in Acute Ulcerative Colitis Presentations

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians

ChatGPT/GPT‐4 (large language models): Opportunities and challenges of perspective in bariatric healthcare professionals