Development and validation of a patient-level model to predict dementia across a network of observational databases

Luis H. John,Egill A. Fridgeirsson,Jan A. Kors,Jenna M. Reps,Ross D. Williams,Patrick B. Ryan,Peter R. Rijnbeek
DOI: https://doi.org/10.1186/s12916-024-03530-9
IF: 9.3
2024-07-29
BMC Medicine
Abstract:Abstract Background A prediction model can be a useful tool to quantify the risk of a patient developing dementia in the next years and take risk-factor-targeted intervention. Numerous dementia prediction models have been developed, but few have been externally validated, likely limiting their clinical uptake. In our previous work, we had limited success in externally validating some of these existing models due to inadequate reporting. As a result, we are compelled to develop and externally validate novel models to predict dementia in the general population across a network of observational databases. We assess regularization methods to obtain parsimonious models that are of lower complexity and easier to implement. Methods Logistic regression models were developed across a network of five observational databases with electronic health records (EHRs) and claims data to predict 5-year dementia risk in persons aged 55–84. The regularization methods L1 and Broken Adaptive Ridge (BAR) as well as three candidate predictor sets to optimize prediction performance were assessed. The predictor sets include a baseline set using only age and sex, a full set including all available candidate predictors, and a phenotype set which includes a limited number of clinically relevant predictors. Results BAR can be used for variable selection, outperforming L1 when a parsimonious model is desired. Adding candidate predictors for disease diagnosis and drug exposure generally improves the performance of baseline models using only age and sex. While a model trained on German EHR data saw an increase in AUROC from 0.74 to 0.83 with additional predictors, a model trained on US EHR data showed only minimal improvement from 0.79 to 0.81 AUROC. Nevertheless, the latter model developed using BAR regularization on the clinically relevant predictor set was ultimately chosen as best performing model as it demonstrated more consistent external validation performance and improved calibration. Conclusions We developed and externally validated patient-level models to predict dementia. Our results show that although dementia prediction is highly driven by demographic age, adding predictors based on condition diagnoses and drug exposures further improves prediction performance. BAR regularization outperforms L1 regularization to yield the most parsimonious yet still well-performing prediction model for dementia.
medicine, general & internal
What problem does this paper attempt to address?