Hierarchical joint analysis of marginal summary statistics—Part II: High‐dimensional instrumental analysis of omics data

Lai Jiang,Jiayi Shen,Burcu F. Darst,Christopher A. Haiman,Nicholas Mancuso,David V. Conti
DOI: https://doi.org/10.1002/gepi.22577
2024-06-19
Genetic Epidemiology
Abstract:Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome‐wide association studies. However, most multivariate IV approaches cannot scale to high‐throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA‐JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA‐JAM aims to estimate the conditional effect for high‐dimensional risk factors on an outcome by incorporating estimates from association analyses of single‐nucleotide polymorphism (SNP)‐intermediate or SNP‐gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA‐JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean‐squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.
genetics & heredity,mathematical & computational biology
What problem does this paper attempt to address?