A machine learning model for early diagnosis of type 1 Gaucher disease using real-life data

Avraham Tenenbaum,Shoshana Revel-Vilk,Sivan Gazit,Michael Roimi,Aidan Gill,Dafna Gilboa,Ora Paltiel,Orly Manor,Varda Shalev,Gabriel Chodick
DOI: https://doi.org/10.1016/j.jclinepi.2024.111517
IF: 7.407
2024-09-09
Journal of Clinical Epidemiology
Abstract:Objective The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease's early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services (MHS) electronic database, which contains twenty years of longitudinal data on approximately 2.6 million patients. Study Design and Setting We screened the MHS database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area-under-the receiver-operating-characteristic curve (AUROC) and the area-under-the-precision-recall curve (AUPRC). Results We detected 264 confirmed patients with GD to which we matched 3429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an AUROC = 0.95 ± 0.03 and AUPRC = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25 th -75 th percentile: 1.29-4.53). Conclusions Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.
public, environmental & occupational health,health care sciences & services
What problem does this paper attempt to address?