Precision of Chatbot Generative Pretrained Transformer Version 4-Generated References for Colon and Rectal Surgical Literature
Aaron L Albuck,Chad M Becnel,Daniel J Sirna,Jacquelyn Turner
DOI: https://doi.org/10.1016/j.jss.2024.07.021
2024-08-08
Abstract:Introduction: The TUSOM ONR STEM-IMPRESS Program sees a future where everyone is given opportunities and resources to become a physician regardless of race, gender, sexual orientation, social economic status, or any other differences in the shared learning environment. We strive to achieve this vision by EMPOWERING all college freshmen and sophomores to make the decision to enter the medical field; ENRICHING them with a well-rounded program that includes clinical rotations, research projects, MCAT preparations, interview tips, and panel sessions; and ENGAGING with them by establishing career-long mentorship. Program values: Equity-IMPRESS offers services to those who would otherwise not have access to opportunities and exposure to healthcare careers. Diversity-IMPRESS recognizes the importance of diversity in medicine including race, gender, sexual orientation, social economic status, ideology, or any other differences. Inclusion-IMPRESS values the uniqueness of each individual viewpoint, experience, and their combined contribution to a more inclusive environment in medicine. Culture-IMPRESS promotes a culture in medicine that reflects the richness and vitality in the culture of New Orleans and the surrounding area. The objective is to assess the precision of references generated by Chatbot Generative Pretrained Transformer version 4 (ChatGPT-4) in scientific literature pertaining to colon and rectal surgery. Methods: Ten frequently studied keywords pertaining to colon and rectal surgery were chosen: colon cancer, rectal cancer, anal cancer, total neoadjuvant therapy, diverticulitis, low anterior resection, transanal minimally invasive surgery, ileal pouch anal anastomosis, abdominoperineal resection, and hemorrhoidectomy. ChatGPT-4 was prompted to search for the most representative citations for all keywords. After this, two separate evaluators meticulously examined the outcomes each key element, awarding full accuracy to generated citations in which there was no discrepancies in any of the fields when cross-referenced with the Scopus, Google, and PubMed databases. References from ChatGPT-4 underwent a thorough review process, which involved careful examination of key elements such as the article title, authors, journal name, publication year, and Digital Object Identifier (DOI). Results: Forty-one of the 100 references generated by were fully accurate; however, but none included a DOI. Partial accuracy was observed in 67 of the references, which were identifiable by title and journal. Performance varied across specific keywords; for example, references for colon and rectal cancer were 100% identifiable by title and journal, but no term had 100% accuracy across all categories. Notably, none of the generated references correctly listed all authors. Conducted within a short timeframe during which ChatGPT4 is rapidly evolving and updating its knowledge base. Conclusions: While ChatGPT-4 offers improvements over its predecessors and shows potential for use in academic literature, its inconsistent performance across categories, lack of DOIs, and irregularities in authorship listings raise concerns about its readiness for application in the field of colon and rectal surgery research.