Cross-institution natural language processing for reliable clinical association studies: a methodological exploration
Madhumita Sushil,Atul J. Butte,Ewoud Schuit,Maarten van Smeden,Artuur M. Leeuwenberg
DOI: https://doi.org/10.1016/j.jclinepi.2024.111258
IF: 7.407
2024-01-17
Journal of Clinical Epidemiology
Abstract:Objective Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically: exposures) to reliably conduct exposure-outcome association studies. Study design and setting In a convenience sample of patients admitted to the intensive care unit of a US academic health system, multiple association studies are conducted, comparing the association estimates based on NLP-extracted versus manually extracted exposure variables. The association studies varied in: NLP model architecture (BERT, LSTM), training paradigm (training a new model, fine-tuning an existing external model), extracted exposures (employment status, living status, and substance use), health outcomes (having a do-not-resuscitate/intubate code, length of stay, and in-hospital mortality), missing data handling (multiple imputation vs. complete case analysis), and the application of measurement error correction (via regression calibration). Results The study was conducted on 1,174 participants (median [IQR] age, 61 [50, 73] years; [60.6%] male). Additionally, up to 500 discharge reports of participants from the same health system and 2,528 reports of participants from an external health system were used to train the NLP models. Substantial differences were found between the associations based on NLP-extracted and manually extracted exposures under all settings. The error in association was only weakly correlated with the overall F1-score of the NLP models. Conclusion Associations estimated using NLP-extracted exposures should be interpreted with caution. Further research is needed to set conditions for reliable use of NLP in medical association studies.
public, environmental & occupational health,health care sciences & services