Assessing the efficacy of artificial intelligence to provide peri‐operative information for patients with a stoma

Bryan Lim,Gabriel Lirios,Aditya Sakalkale,Shriranshini Satheakeerthy,Diana Hayes,Justin M.C. Yeung
DOI: https://doi.org/10.1111/ans.19337
IF: 1.7
2024-12-04
ANZ Journal of Surgery
Abstract:Our study investigates the potential of artificial intelligence (AI) in aiding stoma therapy management post‐colorectal surgery, given the challenges in providing comprehensive patient education and support. We evaluated four prominent large language models (LLMs) for their effectiveness as supplementary clinical tools, finding variations in readability and reliability. While certain models, notably CoPilot and ChatGPT‐4, demonstrated superior performance in key metrics, the study emphasizes the early stage of LLM technology in clinical applications. Despite the promise of AI in enhancing patient education and support, the study underscores the importance of considering the limitations and contextual relevance of these models in stoma management. Background Stomas present significant lifestyle and psychological challenges for patients, requiring comprehensive education and support. Current educational methods have limitations in offering relevant information to the patient, highlighting a potential role for artificial intelligence (AI). This study examined the utility of AI in enhancing stoma therapy management following colorectal surgery. Material and Methods We compared the efficacy of four prominent large language models (LLM)—OpenAI's ChatGPT‐3.5 and ChatGPT‐4.0, Google's Gemini, and Bing's CoPilot—against a series of metrics to evaluate their suitability as supplementary clinical tools. Through qualitative and quantitative analyses, including readability scores (Flesch–Kincaid, Flesch‐Reading Ease, and Coleman‐Liau index) and reliability assessments (Likert scale, DISCERN score and QAMAI tool), the study aimed to assess the appropriateness of LLM‐generated advice for patients managing stomas. Results There are varying degrees of readability and reliability across the evaluated models, with CoPilot and ChatGPT‐4 demonstrating superior performance in several key metrics such as readability and comprehensiveness. However, the study underscores the infant stage of LLM technology in clinical applications. All responses required high school to college level education to comprehend comfortably. While the LLMs addressed users' questions directly, the absence of incorporating patient‐specific factors such as past medical history generated broad and generic responses rather than offering tailored advice. Conclusion The complexity of individual patient conditions can challenge AI systems. The use of LLMs in clinical settings holds promise for improving patient education and stoma management support, but requires careful consideration of the models' capabilities and the context of their use.
surgery
What problem does this paper attempt to address?