Searching in the Dark: Phenotyping Diabetic Retinopathy in a De-Identified Electronic Medical Record Sample of African Americans

Nicole A Restrepo,Eric Farber-Eger,Dana C Crawford
2016-07-20
Abstract:A hurdle to EMR-based studies is the characterization and extraction of complex phenotypes not readily defined by single diagnostic/procedural codes. Here we developed an algorithm utilizing data mining techniques to identify a diabetic retinopathy (DR) cohort of type-2 diabetic African Americans from the Vanderbilt University de-identified EMR system. The algorithm incorporates a combination of diagnostic codes, current procedural terminology billing codes, medications, and text matching to identify DR when gold-standard digital photography results were unavailable. DR cases were identified with a positive predictive value of 75.3% and an accuracy of 84.8%. Controls were classified with a negative predictive value of 1.0% as could be assessed. Limited studies of DR have been performed in African Americans who are at an elevated risk of DR. Identification of EMR-based African American cohorts may help stimulate new biomedical studies that could elucidate differences in risk for the development of DR and other complex diseases.
What problem does this paper attempt to address?