SEMI-PARAMETRIC COVARIATE-MODULATED LOCAL FALSE DISCOVERY RATE FOR GENOME-WIDE ASSOCIATION STUDIES

Rong W. Zablocki,Richard A. Levine,Andrew J. Schork,Shujing Xu,Yunpeng Wang,Chun C. Fan,Wesley K. Thompson
DOI: https://doi.org/10.1101/183384
2017-08-31
Abstract:While genome-wide association studies (GWAS) have discovered thousands of risk loci for heritable disorders, so far even very large meta-analyses have recovered only a fraction of the heritability of most complex traits. Recent work utilizing variance components models has demonstrated that a larger fraction of the heritability of complex phenotypes is captured by the additive effects of SNPs than is evident only in loci surpassing genome-wide significance thresholds, typically set at a Bonferroni-inspired p ≤ 5 x 10 -8 . Procedures that control false discovery rate can be more powerful, yet these are still under-powered to detect the majority of non-null effects from GWAS. The current work proposes a novel Bayesian semi-parametric two-group mixture model and develops a Markov Chain Monte Carlo (MCMC) algorithm for a covariate-modulated local false discovery rate ( cmfdr ). The probability of being non-null depends on a set of covariates via a logistic function, and the non-null distribution is approximated as a linear combination of B-spline densities, where the weight of each B-spline density depends on a multinomial function of the covariates. The proposed methods were motivated by work on a large meta-analysis of schizophrenia GWAS performed by the Psychiatric Genetics Consortium (PGC). We show that the new cmfdr model fits the PGC schizophrenia GWAS test statistics well, performing better than our previously proposed parametric gamma model for estimating the non-null density and substantially improving power over usual fdr. Using loci declared significant at cmfdr ≤ 0.20, we perform follow-up pathway analyses using the Kyoto Encyclopedia of Genes and Genomes (KEGG) homo sapiens pathways database. We demonstrate that the increased yield from the cmfdr model results in an improved ability to test for pathways associated with schizophrenia compared to using those SNPs selected according to usual fdr.
What problem does this paper attempt to address?