Testing a Large Number of Composite Null Hypotheses Using Conditionally Symmetric Multidimensional Gaussian Mixtures in Genome-Wide Studies
Ryan Sun,Zachary R. McCaw,Xihong Lin
DOI: https://doi.org/10.1080/01621459.2024.2422124
IF: 4.369
2024-12-07
Journal of the American Statistical Association
Abstract:Causal mediation, pleiotropy, and replication analyses are three highly popular genetic study designs. Although these analyses address different scientific questions, the underlying statistical inference problems all involve large-scale testing of composite null hypotheses. The goal is to determine whether all null hypotheses—as opposed to at least one—in a set of individual tests should simultaneously be rejected. Recently, various methods have been proposed for each of these situations, including an appealing two-group empirical Bayes approach that calculates local false discovery rates (lfdr). However, lfdr estimation is difficult due to the need for multivariate density estimation. Furthermore, the multiple testing rules for the empirical Bayes lfdr approach can disagree with conventional frequentist z-statistics, which is troubling for a field that ubiquitously uses summary statistics. This work proposes a framework to unify two-group testing in genetic association composite null settings, the conditionally symmetric multidimensional Gaussian mixture model (csmGmm). The csmGmm is shown to demonstrate more robust operating characteristics than recently-proposed alternatives. Crucially, the csmGmm also offers interpretability guarantees by harmonizing lfdr and z-statistic testing rules. We extend the base csmGmm to cover each of the mediation, pleiotropy, and replication settings, and we prove that the lfdr z-statistic agreement holds in each situation. We apply the model to a collection of translational lung cancer genetic association studies that motivated this work. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
statistics & probability