Abstract:Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (F ST ) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t -test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry. Many diseases and complex traits, such as prostate cancer, exhibit differences in incidence across populations. Yet the potential contribution of genetic factors towards such disparities is unclear. Polygenic scores summarise genetic effects across the genome and can in principle provide a valuable tool for assessing and comparing disease risk across populations. In practise, current approaches based on polygenic scores assume that such scores perfectly measure genetic risk of disease without measurement error, and thus do not account for uncertainty that arises in the construction of the score from a finite genome-wide association study (GWAS) training sample, which can be substantial. We introduce a Bayesian approach based on the LDpred2 polygenic score model that accounts fully for training sample uncertainty, and we propose a Wald test for formally testing such genetic risk differences across populations. Simulations show that the method properly controls for type 1 errors assuming independent SNPs (achieved by pruning), and that statistical power is sensitive to both the genetic architecture (heritability and polygenicity) and training sample size. In application to prostate cancer, this framework enables us to identify a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.

Simulating genetic risk scores from summary statistics

Evaluating the predictive value of genetic risk score in colorectal cancer among Chinese Han population

Genetic Risk Score: Principle, Methods and Application

[The Application of Genetic Risk Score in Genetic Studies of Complex Human Diseases].

A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data

Methodologies underpinning polygenic risk scores estimation: a comprehensive overview

Development and Standardization of an Improved Type 1 Diabetes Genetic Risk Score for Use in Newborn Screening and Incident Diagnosis

Estimation of offspring genetic risk scores using parental genotypes.

Effective Genetic Risk Prediction Using Mixed Models

Type 1 Diabetes Risk in African-Ancestry Participants and Utility of an Ancestry-Specific Genetic Risk Score

Alternative Methods for H1 Simulations in Genome Wide Association Studies

GWASBrewer: An R Package for Simulating Realistic GWAS Summary Statistics

Evaluating the Effect of Multiple Genetic Risk Score Models on Colorectal Cancer Risk Prediction.

Summaryauc: A Tool For Evaluating The Performance Of Polygenic Risk Prediction Models In Validation Datasets With Only Summary Level Statistics

Calculating Polygenic Risk Scores (PRS) in UK Biobank: A Practical Guide for Epidemiologists

Optimizing and benchmarking polygenic risk scores with GWAS summary statistics

Genetic Risk Scores for the Clinical Rheumatologist

Evaluation of polygenic risk scores to differentiate between type 1 and type 2 diabetes

A Machine-Learning Heuristic to Improve Gene Score Prediction of Polygenic Traits

Bayesian approach to assessing population differences in genetic risk of disease with application to prostate cancer

Leveraging Effect Size Distributions to Improve Polygenic Risk Scores Derived from Summary Statistics of Genome-Wide Association Studies.