A consensus variant-to-function score to functionally prioritize variants for disease
Tabassum Fabiha,Ivy Evergreen,Soumya Kundu,Anusri Pampari,Sergey Abramov,Alexandr Boytsov,Kari Strouse,Katherine Dura,Weixiang Fang,Gaspard Kerner,John Butts,Thahmina Ali,Andreas Gschwind,Kristy S Mualim,Jill E Moore,Zhiping Weng,Jacob Ulirsch,Hongkai E Ji,Jeff Vierstra,Timothy E. Reddy,Stephen B Montgomery,Jesse Engreitz,Anshul Kundaje,Ryan Tewhey,Alkes Price,Kushal Dey
DOI: https://doi.org/10.1101/2024.11.07.622307
2024-11-10
Abstract:Identifying and functionally characterizing causal disease variants in genome-wide association studies remains a pressing challenge. Here, we construct a consensus variant-to-function (cV2F) score that assigns a single value to each common single-nucleotide variant in the genome, and helps to predict and characterize causal disease variants. The cV2F score leverages features reflecting variant-level experimentally and computationally predicted function (e.g. allelic imbalance and sequence-based deep learning models) and element-level function (e.g. predicted enhancers), and learns optimal combinations of features by training a gradient boosting model on GWAS fine-mapping results. The cV2F-annotated variants attained an AUPRC of 0.822 at identifying held-out fine-mapped variants. Variants with high cV2F scores are highly enriched for heritability (14.2x, s.e. 0.5) across 66 diseases/traits, are uniquely informative for disease heritability, and are highly predictive of variants implicated by reporter assays; cV2F substantially outperforms previous variant-to-function scores using all of these metrics. GWAS fine-mapping of 110 diseases/traits informed by cV2F identified 14.3% more confidently fine-mapped (PIP > 0.95) variants than non-functionally informed fine-mapping. We further constructed tissue/cell line-specific cV2F scores that prioritize variants based on regulatory potential in specific tissues/cell lines, attaining high heritability enrichment for tissue-related diseases/traits (15.6x, s.e. 2.3) while providing independent information (average correlation of 0.27 with the primary cV2F score). We highlight examples of GWAS loci for which cV2F pinpoints causal variants with high confidence and elucidates their functional role.
Genetics