Testing hypotheses on a tree: new error rates and controlling strategies

Marina Bogomolov,Christine B. Peterson,Yoav Benjamini,Chiara Sabatti
DOI: https://doi.org/10.48550/arXiv.1705.07529
2017-05-22
Methodology
Abstract:We introduce a multiple testing procedure (TreeBH) which addresses the challenge of controlling error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses which are organized hierarchically in a tree structure. We describe a fast algorithm for the proposed sequential procedure, and prove that it controls relevant error rates given certain assumptions on the dependence among the p-values. Through simulations, we demonstrate that TreeBH offers the desired guarantees under a range of dependency structures (including one similar to that encountered in genome-wide association studies) and that it has the potential of gaining power over alternative methods. We also introduce a modified version of TreeBH which we prove to control the relevant error rates under any dependency structure. We conclude with two case studies: we first analyze data collected as part of the Genotype-Tissue Expression (GTEx) project, which aims to characterize the genetic regulation of gene expression across multiple tissues in the human body, and secondly, data examining the relationship between the gut microbiome and colorectal cancer.
What problem does this paper attempt to address?