Abstract:Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (F ST ) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t -test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry. Many diseases and complex traits, such as prostate cancer, exhibit differences in incidence across populations. Yet the potential contribution of genetic factors towards such disparities is unclear. Polygenic scores summarise genetic effects across the genome and can in principle provide a valuable tool for assessing and comparing disease risk across populations. In practise, current approaches based on polygenic scores assume that such scores perfectly measure genetic risk of disease without measurement error, and thus do not account for uncertainty that arises in the construction of the score from a finite genome-wide association study (GWAS) training sample, which can be substantial. We introduce a Bayesian approach based on the LDpred2 polygenic score model that accounts fully for training sample uncertainty, and we propose a Wald test for formally testing such genetic risk differences across populations. Simulations show that the method properly controls for type 1 errors assuming independent SNPs (achieved by pruning), and that statistical power is sensitive to both the genetic architecture (heritability and polygenicity) and training sample size. In application to prostate cancer, this framework enables us to identify a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.

Addressing Population-Specific Multiple Testing Burdens in Genetic Association Studies.

Revisiting the genome-wide significance threshold for common variant GWAS

GWAS significance thresholds in large cohorts

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Accounting for multiple comparisons in a genome-wide association study (GWAS)

Fast and Accurate Approximation to Significance Tests in Genome-Wide Association Studies

GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size

Current limitations of SNP data from the public domain for studies of complex disorders: a test for ten candidate genes for obesity and osteoporosis

Family-Based Association Tests for Genomewide Association Scans

Detect and Adjust for Population Stratification in Population-Based Association Study Using Genomic Control Markers: an Application of Affymetrix Genechip® Human Mapping 10K Array

Quick Approximation of Threshold Values for Genome-Wide Association Studies.

In search of causal variants: refining disease association signals using cross-population contrasts

A Robust and Efficient Statistical Method for Genetic Association Studies Using Case and Control Samples from Multiple Cohorts

Bayesian approach to assessing population differences in genetic risk of disease with application to prostate cancer

Power Estimation Of Multiple Snp Association Test Of Case-Control Study And Application

Alternative Methods for H1 Simulations in Genome Wide Association Studies

Genetic analyses of diverse populations improves discovery for complex traits

Consequence of adjustments for demographic or clinical covariates and a recommended solution in genome-wide association studies

Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data

Testing Genetic Association with Rare Variants in Admixed Populations.

Magnitude of Stratification in Human Populations and Impacts on Genome Wide Association Studies