Multiple testing when many $p$-values are uniformly conservative, with application to testing qualitative interaction in educational interventions

Qingyuan Zhao,Dylan S. Small,Weijie Su
DOI: https://doi.org/10.48550/arXiv.1703.09787
2017-08-27
Abstract:In the evaluation of treatment effects, it is of major policy interest to know if the treatment is beneficial for some and harmful for others, a phenomenon known as qualitative interaction. We formulate this question as a multiple testing problem with many conservative null $p$-values, in which the classical multiple testing methods may lose power substantially. We propose a simple technique---conditioning---to improve the power. A crucial assumption we need is uniform conservativeness, meaning for any conservative $p$-value $p$, the conditional distribution $(p/\tau)\,|\,p \le \tau$ is stochastically larger than the uniform distribution on $(0,1)$ for any $\tau$. We show this property holds for one-sided tests in a one-dimensional exponential family (e.g.\ testing for qualitative interaction) as well as testing $|\mu|\le\eta$ using a statistic $X \sim \mathrm{N}(\mu,1)$ (e.g.\ testing for practical importance with threshold $\eta$). We propose an adaptive method to select the threshold $\tau$. Our theoretical and simulation results suggest the proposed tests gain significant power when many $p$-values are uniformly conservative and lose little power when no $p$-value is uniformly conservative. We apply our method to two educational intervention datasets.
Methodology
What problem does this paper attempt to address?