Abstract 13965: Natural Language Processing of Hospitalization Discharge Summary to Predict 1-year Post-Discharge Mortality Among Patients With Acute Heart Failure
Benjamin Rosand,Dragomir Radev,Harlan M Krumholz,Rohan Khera,Benjamin RosandDragomir RadevHarlan M KrumholzRohan KheraYale Sch of Medicine,New Haven,CTComputer Science,Yale Univ,New Haven,CTYale Univ,New Haven,CT
DOI: https://doi.org/10.1161/circ.144.suppl_1.13965
IF: 37.8
2021-11-10
Circulation
Abstract:Background: Patients hospitalized with heart failure (HF) have over 20% risk post-discharge 1-year mortality. Current risk models rely on multiple structured clinical data elements including vitals, diagnoses, laboratory tests and medications to predict mortality and have modest performance (AUROC 0.70-0.74). However, clinical notes, which capture the assessment of patients and their prognosis have had a limited role in predictive models, due to challenges with engineering features from the unstructured text. Methods: We identified all primary HF hospitalizations for consecutive patients at Yale, between 2013-2019, and linked to the Connecticut death index to assess 1-year post-discharge deaths. The discharge summaries for these hospitalizations were split into training and validation sets (80% and 10%), and a 10% independent test set. We used the interpretable term frequency-inverse document frequency NLP model to assess the probability of 1-year death, evaluated against observed events, presented as Kaplan-Meier Curves, and interpreted using a word importance plot. Results: There were 6702 discharge summaries for HF hospitalization survivors. Their mean age was 78 (SD 15) years, 50% were women, with a 1-year mortality of 21%. The NLP model on discharge summaries had high discrimination for 1-year death in the independent test set, AUROC 0.78 (Figure A), with high positive and negative predictive role (F1 score, 0.73). Moreover, the model identified prognosis over the course of the post-discharge year (Fig B), reflecting prognostic cues outside of the immediate post-discharge period. Word tokens of chronic, severe, end organ disease, represented most frequent predictors of mortality (Fig C). Conclusions: An NLP model exceeded the performance of the best models based on structured data. Risk-prediction using unstructured clinical notes could reduce the burden of data collection for risk prediction and adjustment while maintaining interpretability.
cardiac & cardiovascular systems,peripheral vascular disease