Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare

Lin Lawrence Guo,Keith E. Morse,Catherine Aftandilian,Ethan Steinberg,Jason Fries,Jose Posada,Scott Lanyon Fleming,Joshua Lemmon,Karim Jessa,Nigam Shah,Lillian Sung
DOI: https://doi.org/10.1186/s12911-024-02449-8
IF: 3.298
2024-02-16
BMC Medical Informatics and Decision Making
Abstract:Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels.
medical informatics
What problem does this paper attempt to address?