Diagnostic suspicion bias and machine learning: Breaking the awareness deadlock for sepsis detection
Varesh Prasad,Baturay Aydemir,Iain E. Kehoe,Chaya Kotturesh,Abigail O’Connell,Brett Biebelberg,Yang Wang,James C. Lynch,Jeremy A. Pepino,Michael R. Filbin,Thomas Heldt,Andrew T. Reisner
DOI: https://doi.org/10.1371/journal.pdig.0000365
2023-11-02
PLOS Digital Health
Abstract:Many early warning algorithms are downstream of clinical evaluation and diagnostic testing, which means that they may not be useful when clinicians fail to suspect illness and fail to order appropriate tests. Depending on how such algorithms handle missing data, they could even indicate "low risk" simply because the testing data were never ordered. We considered predictive methodologies to identify sepsis at triage, before diagnostic tests are ordered, in a busy Emergency Department (ED). One algorithm used "bland clinical data" (data available at triage for nearly every patient). The second algorithm added three yes/no questions to be answered after the triage interview. Retrospectively, we studied adult patients from a single ED between 2014–16, separated into training (70%) and testing (30%) cohorts, and a final validation cohort of patients from four EDs between 2016–2018. Sepsis was defined per the Rhee criteria. Investigational predictors were demographics and triage vital signs (downloaded from the hospital EMR); past medical history; and the auxiliary queries (answered by chart reviewers who were blinded to all data except the triage note and initial HPI). We developed L2-regularized logistic regression models using a greedy forward feature selection. There were 1164, 499, and 784 patients in the training, testing, and validation cohorts, respectively. The bland clinical data model yielded ROC AUC's 0.78 (0.76–0.81) and 0.77 (0.73–0.81), for training and testing, respectively, and ranged from 0.74–0.79 in four hospital validation. The second model which included auxiliary queries yielded 0.84 (0.82–0.87) and 0.83 (0.79–0.86), and ranged from 0.78–0.83 in four hospital validation. The first algorithm did not require clinician input but yielded middling performance. The second showed a trend towards superior performance, though required additional user effort. These methods are alternatives to predictive algorithms downstream of clinical evaluation and diagnostic testing. For hospital early warning algorithms, consideration should be given to bias and usability of various methods. Predictive algorithms for hospitals often rely on the results of diagnostic tests as predictors for whether patients have serious and unexpected conditions. Strong predictive performance of such algorithms might be misleading for the following reason: doctors may not order the appropriate diagnostic tests unless they already have some level of concern about the patient, so the data will be available if doctors are already suspecting the correct diagnosis but not available in cases when doctors overlook the correct diagnosis. In this manuscript, we consider early sepsis identification and explore two alternative strategies for avoiding any reliance on diagnostic testing: the use of "bland" data that should be available on every single patient, and the use of a few objective "yes/no" questions that might be answered on patients with abnormal vital signs, to provide additional information for the predictive algorithms.