Improved heritability partitioning and enrichment analyses using summary statistics with graphREML

Hui Li,Tushar Kamath,Rahul Mazumder,Xihong Lin,Luke O'Connor

DOI: https://doi.org/10.1101/2024.11.04.24316716

2024-11-05

Abstract:Heritability enrichment analysis using data from Genome-Wide Association Studies (GWAS) is often used to understand the functional basis of genetic architecture. Stratified LD score regression (SLDSC) is a widely used method-of-moments estimator for heritability enrichment, but S-LDSC has low statistical power compared with likelihood-based approaches. We introduce graphREML, a precise and powerful likelihood-based heritability partition and enrichment analysis method. graphREML operates on GWAS summary statistics and linkage disequilibrium graphical models (LDGMs), whose sparsity makes likelihood calculations tractable. We validate our method using extensive simulations and in analyses of a wide range of real traits. On average across traits, graphREML produces enrichment estimates that are concordant with S-LDSC, indicating that both methods are unbiased; however, graphREML identifies 2.5 times more significant trait-annotation enrichments, demonstrating greater power compared to the moment-based S-LDSC approach. graphREML can also more flexibly model the relationship between the annotations of a SNP and its heritability, producing well-calibrated estimates of per-SNP heritability.

Genetic and Genomic Medicine

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the statistical power of heritability stratification and enrichment analysis based on genome - wide association study (GWAS) data. Specifically, although the existing stratified linkage disequilibrium score regression (S - LDSC) method is easy to use and can handle publicly available summary statistics, its statistical power is low, which may lead to many important enrichment signals not being detected. To solve this problem, the authors propose a new method - graphREML. graphREML is a heritability stratification and enrichment analysis method based on likelihood estimation. It utilizes GWAS summary statistics and linkage disequilibrium graph models (LDGMs). Through this method, the authors aim to: 1. **Improve statistical power**: Compared with S - LDSC, graphREML can estimate heritability enrichment more accurately and significantly improve statistical power, thereby discovering more significant enrichment signals. 2. **Handle overlapping annotations**: graphREML can directly model the relationship between SNP annotations and heritability, handling the problem of overlapping annotations without the need for individual - level data. 3. **Ensure non - negative heritability estimates**: graphREML uses a non - negative inverse link function to ensure that the heritability estimate for each SNP is non - negative, avoiding the problem of negative heritability estimates that may occur in S - LDSC. ### Formula Explanation The formulas mentioned in the paper are as follows: - The single - SNP heritability of SNP \( j \) is: \[ \sigma^2_j = g^{-1}(\eta_j), \quad \text{where} \quad \eta_j = a_j^\top \tau \] where \( a_j \) is the annotation value vector of SNP \( j \), \( \tau \) is the vector of unknown parameters, and \( g(\cdot) \) is a non - negative link function. Through these improvements, graphREML not only improves the accuracy and statistical power of heritability stratification and enrichment analysis, but also maintains robustness when dealing with complex annotation structures.

Improved heritability partitioning and enrichment analyses using summary statistics with graphREML

Improved estimation of functional enrichment in SNP heritability using feasible generalized least squares

Accurate and Efficient Estimation of Local Heritability using Summary Statistics and LD Matrix

REMI: REGRESSION WITH MARGINAL INFORMATION AND ITS APPLICATION IN GENOME-WIDE ASSOCIATION STUDIES

Leveraging LD eigenvalue regression to improve the estimation of SNP heritability and confounding inflation

Total Heritability Explained by All Variants in Genome-Wide Association Studies Based on Summary Statistics with Standard Error Estimates

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

Scalable summary statistics-based heritability estimation method with individual genotype level accuracy

An integrated approach to reduce the impact of minor allele frequency and linkage disequilibrium on variable importance measures for genome-wide data

Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies

Reliable Heritability Estimation Using Sparse Regularization in Ultrahigh Dimensional Genome-Wide Association Studies

Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy

High-dimensional genome-wide association study and misspecified mixed model analysis

Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

LLR: a Latent Low-Rank Approach to Colocalizing Genetic Risk Variants in Multiple GWAS.

Discovering non-additive heritability using additive GWAS summary statistics

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Sparse matrix factorization robust to sample sharing across GWAS reveals interpretable genetic components

A Method to Estimate the Contribution of Regional Genetic Associations to Complex Traits from Summary Association Statistics

A robust two-sample Mendelian Randomization method integrating GWAS with multi-tissue eQTL summary statistics

A Robust Statistical Method For Association-Based Eqtl Analysis