Abstract:Background Polygenic risk scores (PRS) have ushered in a new era in genetic epidemiology, offering insights into individual predispositions to a wide range of diseases. However, despite recent marked enhancements in their predictive power, there are still challenges that need to be overcome before PRS-based models can be broadly applied in the clinic, including sufficient accuracy, easy interpretability and portability across diverse populations. Methods Leveraging trans-ancestry genome-wide association study (GWAS) meta-analysis, we generated novel, diverse summary statistics for 30 medically-related traits which were used to benchmark the performance of six existing PRS algorithms using UK biobank. Observing that SBayesRC had the best overall performance but recognizing strengths in each method, we developed an ensemble PRS model using logistic regression to combine outputs from top-performing algorithms. This ensemble model was validated on the diverse eMERGE and PAGE MEC cohorts, and the performance was compared against current state-of-the-art PRS models. To enhance predictive accuracy for clinical application, we incorporated easily-accessible clinical characteristics such as age, gender, ancestry and risk factors, creating disease prediction models intended as prospective diagnostic tests, with easily interpretable positive or negative outcomes. Results Predictive performance of PRS models improved with trans-ancestry GWAS meta-analysis and was further enhanced by the ensemble model, which surpassed state-of-art PRS models. When applied to external cohorts, performance drops were minimal, indicating good calibration. After adding clinical characteristics, 12 out of 30 models surpassed 80% AUC. Further, 25 traits exceeded the diagnostic odds ratio (DOR) of 5 and 19 traits exceeded DOR of 10 for all ancestry groups, indicating high predictive value. The highest DOR in a population with a sufficient number of cases was 66.2 for Alzheimer's disease in Europeans. Our PRS model for coronary artery disease identified 55-80 times more true coronary events than rare pathogenic variant models, reinforcing its clinical potential. The polygenic component modulated the effect of high-risk rare variants, stressing the need to consider all genetic components in clinical settings. Conclusions Newly developed PRS-based disease prediction models have sufficient accuracy and portability to warrant consideration of being used in the clinic.

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

Improving genetic risk prediction by leveraging pleiotropy

Multi-PGS enhances polygenic prediction by combining 937 polygenic scores

Deep learning for polygenic prediction: The role of heritability, interaction type and sample size

Improving on polygenic scores across complex traits using select and shrink with summary statistics (S4) and LDpred2

Optimization of Multi-Ancestry Polygenic Risk Score Disease Prediction Models

Improving polygenic prediction in ancestrally diverse populations

Leveraging haplotype information in heritability estimation and polygenic prediction

Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets

Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning

Sourcing Bivariate Genetic Overlap for Polygenic Prediction using MiXeR-Pred

A Machine-Learning Heuristic to Improve Gene Score Prediction of Polygenic Traits

Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy

Quantifying Portable Genetic Effects and Improving Cross-Ancestry Genetic Prediction with GWAS Summary Statistics

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing

Fast and accurate Bayesian polygenic risk modeling with variational inference

LPG: a four-groups probabilistic approach to leveraging pleiotropy in genome-wide association studies

Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data