Simplifying causal gene identification in GWAS loci

Marijn Schipper,Jacob C Ulirsch,Danielle Posthuma,stephan ripke,Karl Heilbron
DOI: https://doi.org/10.1101/2024.07.26.24311057
2024-07-29
Abstract:Genome-wide association studies (GWAS) help to identify disease-linked genetic variants, but pinpointing the most likely causal genes in GWAS loci remains challenging. Existing GWAS gene prioritization tools are powerful, but often use complex black box models trained on datasets containing unaddressed biases. Here we present CALDERA, a gene prioritization tool that achieves similar or better performance than state-of-the-art methods, but uses just 12 features and a simple logistic regression model with L1 regularization. We use a data-driven approach to construct a truth set of causal genes in 406 GWAS loci and correct for potential confounders. We demonstrate that CALDERA is well-calibrated in external datasets and prioritizes genes with expected properties, such as being mutation-intolerant (OR = 1.751 for pLI > 90%, P = 8.45x10-3). CALDERA facilitates the prioritization of potentially causal genes in GWAS loci and may help identify novel genetics-driven drug targets.
Genetic and Genomic Medicine
What problem does this paper attempt to address?