Development and multi-site external validation of a generalizable risk prediction model for bipolar disorder

Colin G. Walsh,Michael A. Ripperger,Yirui Hu,Yi-han Sheu,Hyunjoon Lee,Drew Wilimitis,Amanda B. Zheutlin,Daniel Rocha,Karmel W. Choi,Victor M. Castro,H. Lester Kirchner,Christopher F. Chabris,Lea K. Davis,Jordan W. Smoller
DOI: https://doi.org/10.1038/s41398-023-02720-y
2024-01-26
Translational Psychiatry
Abstract:Bipolar disorder is a leading contributor to disability, premature mortality, and suicide. Early identification of risk for bipolar disorder using generalizable predictive models trained on diverse cohorts around the United States could improve targeted assessment of high risk individuals, reduce misdiagnosis, and improve the allocation of limited mental health resources. This observational case-control study intended to develop and validate generalizable predictive models of bipolar disorder as part of the multisite, multinational PsycheMERGE Network across diverse and large biobanks with linked electronic health records (EHRs) from three academic medical centers: in the Northeast (Massachusetts General Brigham), the Mid-Atlantic (Geisinger) and the Mid-South (Vanderbilt University Medical Center). Predictive models were developed and valid with multiple algorithms at each study site: random forests, gradient boosting machines, penalized regression, including stacked ensemble learning algorithms combining them. Predictors were limited to widely available EHR-based features agnostic to a common data model including demographics, diagnostic codes, and medications. The main study outcome was bipolar disorder diagnosis as defined by the International Cohort Collection for Bipolar Disorder, 2015. In total, the study included records for 3,529,569 patients including 12,533 cases (0.3%) of bipolar disorder. After internal and external validation, algorithms demonstrated optimal performance in their respective development sites. The stacked ensemble achieved the best combination of overall discrimination (AUC = 0.82–0.87) and calibration performance with positive predictive values above 5% in the highest risk quantiles at all three study sites. In conclusion, generalizable predictive models of risk for bipolar disorder can be feasibly developed across diverse sites to enable precision medicine. Comparison of a range of machine learning methods indicated that an ensemble approach provides the best performance overall but required local retraining. These models will be disseminated via the PsycheMERGE Network website.
psychiatry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to early identify the risk of bipolar disorder (BD) by developing and validating a general - purpose risk prediction model. Specifically, the research aims to use electronic health records (EHRs) of different populations across the United States to construct a prediction model through machine - learning methods, in order to improve the accurate assessment of high - risk individuals, reduce misdiagnosis, and optimize the allocation of limited mental health resources. ### Main research objectives: 1. **Develop a prediction model**: Develop a general - purpose model capable of predicting the risk of bipolar disorder in multiple research centers (Northeast, Mid - Atlantic, and Mid - South). 2. **External validation**: Conduct external validation among different research centers to ensure the generalization ability of the model. 3. **Algorithm comparison**: Compare the performance of multiple machine - learning algorithms (random forest, gradient - boosted machine, penalized regression) to determine the best prediction model. 4. **Ensemble learning**: Further improve the prediction performance of the model through ensemble learning (stacked ensemble). ### Research background: - **The severity of bipolar disorder**: Bipolar disorder is one of the leading causes of disability, premature death, and suicide. - **Diagnostic challenges**: The diagnosis of bipolar disorder usually takes a long time, averaging 6 - 10 years. Patients are often misdiagnosed as having unipolar depression because they first present with major depression. - **Limitations of existing methods**: Currently, family history is mainly relied on to identify high - risk individuals, but this method has limited coverage and lacks effective early identification means. ### Method overview: - **Data sources**: Use EHR data from three large - scale medical centers, which contain records of millions of patients. - **Model development**: Employ multiple machine - learning algorithms (random forest, gradient - boosted machine, penalized regression) to develop prediction models. - **Feature selection**: Features include demographic information, diagnosis codes, and medication use records. - **Model validation**: Conduct validation within each research center and external validation in other research centers to evaluate the generalization ability of the model. ### Main findings: - **Model performance**: The ensemble learning model (stacked ensemble) shows the best overall discrimination ability and calibration performance in all three research centers, with AUC values ranging from 0.82 to 0.87 and a positive predictive value of more than 5% in the highest - risk quantile. - **Generalization ability**: Although the model performs optimally during local training, it also shows good performance in external validation, indicating that the model has a certain generalization ability. - **Potential for clinical application**: These models can be used in resource - limited clinical settings to identify individuals most likely to have undiagnosed bipolar disorder or to predict the onset of bipolar disorder. ### Conclusion: It is feasible to develop and validate a general - purpose bipolar disorder risk prediction model through federated analysis across multiple research centers. These models can accelerate risk research on bipolar disorder, promote pharmacoepidemiological research, and provide potential tools for precision medicine research. Future work should further evaluate the clinical utility of these models and their potential for quantitative phenotyping of this severe mental illness.