Predictive and Explainable Analysis of Post-operative Acute Kidney Injury in Children undergoing Cardiopulmonary Bypass: An Application of Large Language Models

Mansour T.A. Sharabiani,Alireza S Mahani,Alex Bottle,Yadav Srinivasan,Richard W Issitt,Serban Stoica
DOI: https://doi.org/10.1101/2024.05.14.24307372
2024-05-17
MedRxiv
Abstract:Motivation: Embedding large language models (LLMs) hold promise for predictive analytics due to their natively consistent numeric output. This study explores the utility of general-purpose LLMs for extracting interpretable and actionable insights from electronic health record (EHR) text columns in tabular data, focusing on paediatric cardiopulmonary bypass (CPB). Methods: We analysed data from 963 paediatric CPB operations in the UK (2019-2021), focusing on the severity of post-operative acute kidney injury (AKI) as a binary outcome, and using text columns documenting planned surgical procedures and patient diagnoses for each operation as features. We employed OpenAI's "text-embedding-3-large" for embeddings and "gpt-4-turbo" for generating descriptive labels of patient clusters that were formed by applying spherical k-means to embedding vectors. These "AI clusters" were compared against the "expert clusters" - based on the Partial Risk Adjustment in Surgery (PRAiS) v2 protocol - for 1) consistency using adjusted rand index (ARI) between the clusters and 2) predictive performance using area under ROC (AUC) of logistic regression models using cluster memberships as a categorical variable. Results: AI clusters showed statistically significant levels of consistency with expert clusters, evidenced by their ARIs of 0.31 and 0.32 for planned procedures and diagnoses, respectively. The clusters demonstrated comparable discriminative power in predicting severe post-operative kidney injury, with AUCs of 0.63 vs. 0.60 (planned procedures) and 0.56 vs. 0.58 (diagnoses) for AI clusters vs. expert clusters. Notably, AI clusters identified three interventions with significant odds ratios for AKI severity, highlighting potential areas for clinical focus. Replacing k-means clustering followed by logistic regression with k-nearest-neighbours - applied to LLM text embeddings - further improved AUC to 0.66 and 0.63, for planned procedures and diagnoses, respectively. Conclusions: Our findings affirm the potential of LLMs as effective tools in medical text analysis, facilitating both exploratory and predictive tasks. The integration of LLM-derived insights with traditional data analysis methods can significantly enhance risk stratification and outcome prediction in paediatric CPB, underscoring the value of AI-driven approaches in complex healthcare datasets.
What problem does this paper attempt to address?