Constructing synthetic datasets with generative artificial intelligence to train large language models to classify acute renal failure from clinical notes

Onkar Litake,Brian H Park,Jeffrey L Tully,Rodney A Gabriel
DOI: https://doi.org/10.1093/jamia/ocae081
2024-05-20
Abstract:Objectives: To compare performances of a classifier that leverages language models when trained on synthetic versus authentic clinical notes. Materials and methods: A classifier using language models was developed to identify acute renal failure. Four types of training data were compared: (1) notes from MIMIC-III; and (2, 3, and 4) synthetic notes generated by ChatGPT of varied text lengths of 15 (GPT-15 sentences), 30 (GPT-30 sentences), and 45 (GPT-45 sentences) sentences, respectively. The area under the receiver operating characteristics curve (AUC) was calculated from a test set from MIMIC-III. Results: With RoBERTa, the AUCs were 0.84, 0.80, 0.84, and 0.76 for the MIMIC-III, GPT-15, GPT-30- and GPT-45 sentences training sets, respectively. Discussion: Training language models to detect acute renal failure from clinical notes resulted in similar performances when using synthetic versus authentic training data. Conclusion: The use of training data derived from protected health information may not be needed.
What problem does this paper attempt to address?