Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

Zhili Zheng,Shouye Liu,Julia Sidorenko,Ying Wang,Tian Lin,Loic Yengo,Patrick Turley,Alireza Ani,Rujia Wang,Ilja M. Nolte,Harold Snieder,Raul Aguirre-Gamboa,Patrick Deelen,Lude Franke,Jan A. Kuivenhoven,Esteban A. Lopera Maya,Serena Sanna,Morris A. Swertz,Judith M. Vonk,Cisca Wijmenga,Jian Yang,Naomi R. Wray,Michael E. Goddard,Peter M. Visscher,Jian Zeng,LifeLines Cohort Study
DOI: https://doi.org/10.1038/s41588-024-01704-y
IF: 30.8
2024-05-02
Nature Genetics
Abstract:We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using ∼ 7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs.
genetics & heredity
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy of polygenic prediction in complex traits and diseases, especially the prediction accuracy among different ancestral populations. Specifically: 1. **Improving prediction accuracy**: The existing Polygenic Scores (PGSs) have limited prediction accuracy in most complex diseases, especially when applied across ancestral populations, the prediction accuracy will decrease significantly. To solve this problem, the researchers developed a new method - SBayesRC, aiming to improve polygenic prediction by integrating functional genomic annotations and whole - genome variation analysis. 2. **Utilizing functional genomic annotations**: Functional genomic annotations can help distinguish possible causal SNPs (Single Nucleotide Polymorphisms) from non - causal SNPs, thereby improving prediction accuracy. Existing methods usually only consider some common variants (such as SNPs from genotyping arrays or the HapMap3 panel), which may lead to information loss. SBayesRC analyzes all common inferred SNPs and combines functional annotations to more comprehensively capture causal effects. 3. **Handling linkage disequilibrium (LD) differences**: In cross - ancestral prediction, the differences in linkage disequilibrium between GWAS populations and target populations will affect prediction accuracy. SBayesRC can better handle these differences by introducing a low - rank model and a multi - component annotation - dependent mixture prior, thereby improving the accuracy of cross - ancestral prediction. 4. **Exploring factors affecting prediction accuracy**: The researchers also explored other factors affecting prediction accuracy, including SNP density, the amount of functional annotations, GWAS sample size, minor allele frequency (MAF) and the characteristics of linkage disequilibrium (LD). The study found that there is a significant interaction between SNP density and annotation information, and using more SNP and annotation data can significantly improve prediction accuracy. In summary, the main purpose of this paper is to improve the polygenic prediction accuracy of complex traits and diseases in different ancestral populations by developing a new statistical method (SBayesRC) and integrating functional genomic annotations and whole - genome variation analysis.