Abstract:Background: Generative artificial intelligence (AI) facilitates the development of digital twins, which enable virtual representations of real patients to explore, predict and simulate patient health trajectories, ultimately aiding treatment selection and clinical trial design, among other applications. Recent advances in forecasting utilizing generative AI, in particular large language models (LLMs), highlights untapped potential to overcome real-world data (RWD) challenges such as missingness, noise and limited sample sizes, thus empowering the next generation of AI algorithms in healthcare. Methods: We developed the Digital Twin - Generative Pretrained Transformer (DT-GPT) model, which leverages biomedical LLMs using rich electronic health record (EHR) data. Our method eliminates the need for data imputation and normalization, enables prediction of clinical variables, and prediction exploration via a chatbot interface. We analyzed the method's performance on RWD from both a long-term US nationwide non-small cell lung cancer (NSCLC) dataset and a short-term intensive care unit (MIMIC-IV) dataset. Findings: DT-GPT surpassed state-of-the-art machine learning methods in patient trajectory forecasting on mean absolute error (MAE) on both the long-term (3.4% MAE improvement) and the short-term (1.3% MAE improvement) datasets. Additionally, DT-GPT was capable of preserving cross-correlations of clinical variables (average R2 of 0.98), and handling data missingness as well as noise. Finally, we discovered the ability of DT-GPT both to provide insights into a forecast's rationale and to perform zero-shot forecasting on variables not used during the fine-tuning, outperforming even fully trained, leading task-specific machine learning models on 14 clinical variables. Interpretation: DT-GPT demonstrates that LLMs can serve as a robust medical forecasting platform, empowering digital twins that are able to virtually replicate patient characteristics beyond their training data. We envision that LLM-based digital twins will enable a variety of use cases, including clinical trial simulations, treatment selection and adverse event mitigation.

Large Language Models forecast Patient Health Trajectories enabling Digital Twins

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

TWIN-GPT : Digital Twins for Clinical Trials via Large Language Model

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

Learning the natural history of human disease with generative transformers

Digital Twin Generators for Disease Modeling

Large Language Models-Enabled Digital Twins for Precision Medicine in Rare Gynecological Tumors

Transformative potential of Large Language Models in data mining on Electronic Health Records.

A study of generative large language model for medical research and healthcare

Introducing the Large Medical Model: State of the art healthcare cost and risk prediction with transformers trained on patient event sequences

The future landscape of large language models in medicine

Based on Medicine, The Now and Future of Large Language Models

Large language models to facilitate pregnancy prediction after in vitro fertilization

Large language models streamline automated machine learning for clinical studies

Large language model application in emergency medicine and critical care

Empowering digital twins with large language models for global temporal feature learning

A large language model for electronic health records

A Digital Twins Machine Learning Model for Forecasting Disease Progression in Stroke Patients

The Impact of Multimodal Large Language Models on Health Care's Future