Improved heritability partitioning and enrichment analyses using summary statistics with graphREML

Hui Li,Tushar Kamath,Rahul Mazumder,Xihong Lin,Luke O'Connor
DOI: https://doi.org/10.1101/2024.11.04.24316716
2024-11-05
Abstract:Heritability enrichment analysis using data from Genome-Wide Association Studies (GWAS) is often used to understand the functional basis of genetic architecture. Stratified LD score regression (SLDSC) is a widely used method-of-moments estimator for heritability enrichment, but S-LDSC has low statistical power compared with likelihood-based approaches. We introduce graphREML, a precise and powerful likelihood-based heritability partition and enrichment analysis method. graphREML operates on GWAS summary statistics and linkage disequilibrium graphical models (LDGMs), whose sparsity makes likelihood calculations tractable. We validate our method using extensive simulations and in analyses of a wide range of real traits. On average across traits, graphREML produces enrichment estimates that are concordant with S-LDSC, indicating that both methods are unbiased; however, graphREML identifies 2.5 times more significant trait-annotation enrichments, demonstrating greater power compared to the moment-based S-LDSC approach. graphREML can also more flexibly model the relationship between the annotations of a SNP and its heritability, producing well-calibrated estimates of per-SNP heritability.
Genetic and Genomic Medicine
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the statistical power of heritability stratification and enrichment analysis based on genome - wide association study (GWAS) data. Specifically, although the existing stratified linkage disequilibrium score regression (S - LDSC) method is easy to use and can handle publicly available summary statistics, its statistical power is low, which may lead to many important enrichment signals not being detected. To solve this problem, the authors propose a new method - graphREML. graphREML is a heritability stratification and enrichment analysis method based on likelihood estimation. It utilizes GWAS summary statistics and linkage disequilibrium graph models (LDGMs). Through this method, the authors aim to: 1. **Improve statistical power**: Compared with S - LDSC, graphREML can estimate heritability enrichment more accurately and significantly improve statistical power, thereby discovering more significant enrichment signals. 2. **Handle overlapping annotations**: graphREML can directly model the relationship between SNP annotations and heritability, handling the problem of overlapping annotations without the need for individual - level data. 3. **Ensure non - negative heritability estimates**: graphREML uses a non - negative inverse link function to ensure that the heritability estimate for each SNP is non - negative, avoiding the problem of negative heritability estimates that may occur in S - LDSC. ### Formula Explanation The formulas mentioned in the paper are as follows: - The single - SNP heritability of SNP \( j \) is: \[ \sigma^2_j = g^{-1}(\eta_j), \quad \text{where} \quad \eta_j = a_j^\top \tau \] where \( a_j \) is the annotation value vector of SNP \( j \), \( \tau \) is the vector of unknown parameters, and \( g(\cdot) \) is a non - negative link function. Through these improvements, graphREML not only improves the accuracy and statistical power of heritability stratification and enrichment analysis, but also maintains robustness when dealing with complex annotation structures.