Beyond guilty by association at scale: searching for causal variants on the basis of genome-wide summary statistics
Zihuai He,Benjamin Chu,James Yang,Jiaqi Gu,Zhaomeng Chen,Linxi Liu,Tim Morrison,Michael E. Belloy,Xinran Qi,Nima Hejazi,Maya Mathur,Yann Le Guen,Hua Tang,Trevor Hastie,Iuliana Ionita-laza,Chiara Sabatti,Emmanuel Candès
DOI: https://doi.org/10.1101/2024.02.28.582621
2024-05-02
Abstract:Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Here, we present a novel framework for genome-wide detection of sets of variants that carry non-redundant information on the phenotypes and are therefore more likely to be causal in a biological sense. Crucially, our framework requires only summary statistics obtained from standard genome-wide marginal association testing. The described approach, implemented in open-source software, is also computationally efficient, requiring less than 15 minutes on a single CPU to perform genome-wide analysis. Through extensive genome-wide simulation studies, we show that the method can substantially outperform usual two-stage marginal association testing and fine-mapping procedures in precision and recall. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer’s disease (AD), we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of 67 large-scale GWAS summary statistics since 2013 for a variety of phenotypes. Results reveal the method’s capacity to robustly discover additional loci for polygenic traits and pinpoint potential causal variants underpinning each locus beyond conventional GWAS pipeline, contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses.
Genetics