Abstract:Background Polygenic risk scores (PRS) have ushered in a new era in genetic epidemiology, offering insights into individual predispositions to a wide range of diseases. However, despite recent marked enhancements in their predictive power, there are still challenges that need to be overcome before PRS-based models can be broadly applied in the clinic, including sufficient accuracy, easy interpretability and portability across diverse populations. Methods Leveraging trans-ancestry genome-wide association study (GWAS) meta-analysis, we generated novel, diverse summary statistics for 30 medically-related traits which were used to benchmark the performance of six existing PRS algorithms using UK biobank. Observing that SBayesRC had the best overall performance but recognizing strengths in each method, we developed an ensemble PRS model using logistic regression to combine outputs from top-performing algorithms. This ensemble model was validated on the diverse eMERGE and PAGE MEC cohorts, and the performance was compared against current state-of-the-art PRS models. To enhance predictive accuracy for clinical application, we incorporated easily-accessible clinical characteristics such as age, gender, ancestry and risk factors, creating disease prediction models intended as prospective diagnostic tests, with easily interpretable positive or negative outcomes. Results Predictive performance of PRS models improved with trans-ancestry GWAS meta-analysis and was further enhanced by the ensemble model, which surpassed state-of-art PRS models. When applied to external cohorts, performance drops were minimal, indicating good calibration. After adding clinical characteristics, 12 out of 30 models surpassed 80% AUC. Further, 25 traits exceeded the diagnostic odds ratio (DOR) of 5 and 19 traits exceeded DOR of 10 for all ancestry groups, indicating high predictive value. The highest DOR in a population with a sufficient number of cases was 66.2 for Alzheimer's disease in Europeans. Our PRS model for coronary artery disease identified 55-80 times more true coronary events than rare pathogenic variant models, reinforcing its clinical potential. The polygenic component modulated the effect of high-risk rare variants, stressing the need to consider all genetic components in clinical settings. Conclusions Newly developed PRS-based disease prediction models have sufficient accuracy and portability to warrant consideration of being used in the clinic.

Epistatic Features and Machine Learning Improve Alzheimer's Disease Risk Prediction Over Polygenic Risk Scores

Association of whole-person eigen-polygenic risk scores with Alzheimer's disease

Optimization of Multi-Ancestry Polygenic Risk Score Disease Prediction Models

Improving the Utility of Polygenic Risk Scores as a Biomarker for Alzheimer's Disease

Prediction of clinical diagnosis of Alzheimer’s disease, vascular, mixed, and all-cause dementia by a polygenic risk score and APOE status in a community-based cohort prospectively followed over 17 years

Polygenic Risk Scores in Alzheimer’s Disease Genetics: Methodology, Applications, Inclusion, and Diversity

Alzheimer’s disease risk prediction using automated machine learning

Risk prediction of late-onset Alzheimer’s disease implies an oligogenic architecture

Deep Learning-Based Polygenic Risk Analysis for Alzheimer’s Disease Prediction

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

Non-stationary oscillations of sandwich plates under local dynamic loading

Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction

Identifying Genes Associated with Alzheimer's Disease Using Gene-Based Polygenic Risk Score

Neurocognitive trajectory and proteomic signature of inherited risk for Alzheimer's disease

Supplementary Methods for MicroRNAs Reprogram Normal Fibroblasts into CancerAssociated Fibroblasts in Ovarian Cancer

Polygenic Risk Score Reveals Genetic Heterogeneity of Alzheimer’s Disease between the Chinese and European Populations

Improving genetic risk modeling of dementia from real-world data in underrepresented populations

Deep learning methods improve polygenic risk analysis and prediction for Alzheimer’s disease

In vivo validation of late‐onset Alzheimer's disease genetic risk factors

Polygenic Hazard Score Associated Multimodal Brain Networks along the Alzheimer's Disease Continuum.

Towards cascading genetic risk in Alzheimer’s disease