A constructive approach to selective risk control

Zijun Gao,Wenjie Hu,Qingyuan Zhao
2024-01-30
Abstract:Many modern applications require the use of data to both select the statistical tasks and make valid inference after selection. In this article, we provide a unifying approach to control for a class of selective risks. Our method is motivated by a reformulation of the celebrated Benjamini-Hochberg (BH) procedure for multiple hypothesis testing as the iterative limit of the Benjamini-Yekutieli (BY) procedure for constructing post-selection confidence intervals. Although several earlier authors have made noteworthy observations related to this, our discussion highlights that (1) the BH procedure is precisely the fixed-point iteration of the BY procedure; (2) the fact that the BH procedure controls the false discovery rate is almost an immediate corollary of the fact that the BY procedure controls the false coverage-statement rate. Building on this observation, we propose a constructive approach to control extra-selection risk (selection made after decision) by iterating decision strategies that control the post-selection risk (decision made after selection), and show that many previous methods and results are special cases of this general framework. We further extend this approach to problems with multiple selective risks and demonstrate how new methods can be developed. Our development leads to two surprising results about the BH procedure: (1) in the context of one-sided location testing, the BH procedure not only controls the false discovery rate at the null but also at other locations for free; (2) in the context of permutation tests, the BH procedure with exact permutation p-values can be well approximated by a procedure which only requires a total number of permutations that is almost linear in the total number of hypotheses.
Methodology,Statistics Theory,Applications,Computation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in modern applications, how to use data to select statistical tasks and make effective inferences after selection. Specifically, the article proposes a unified method to control a class of selective risks. Selective risks refer to the risks that may occur when making statistical inferences after selecting certain hypotheses or models according to data during the data analysis process. ### Main problem decomposition: 1. **Selection problem**: - How to select potential signals from a large number of candidate signals for further research? - For example, in multiple hypothesis testing, the goal is to reject as many hypotheses as possible while controlling a certain multi - hypothesis testing error rate (such as the false discovery rate FDR). 2. **Inference problem after selection**: - How to make statistical inferences after using data to select some tasks? - A typical example is to construct confidence intervals for parameters selected using data. This problem has achieved many methodological breakthroughs in the past two decades. ### Research background and motivation: In classical statistical theory, it is usually assumed that scientific hypotheses and statistical models have been determined before data analysis. However, in practice, data analysts often hope to use data to select appropriate models or hypotheses for testing. This selection process will cause traditional inference procedures to fail, so a large amount of literature on "selective inference" has been generated. ### Core contributions of the paper: 1. **New interpretation of the BH procedure**: - The paper reinterprets the Benjamini - Hochberg (BH) procedure as the iterative limit of the Benjamini - Yekutieli (BY) procedure. - The BH procedure is used to control the false discovery rate (FDR) in the selection problem, while the BY procedure is used to control the false coverage statement rate (FCR) in the post - selection problem. 2. **Constructive method**: - A method is proposed to control the additional selection risk (selection made after decision) by iteratively controlling the post - selection risk (decision made after selection). - It is proved that many previous methods and results are special cases of this general framework. 3. **Extension and application**: - This method is extended to problems with multiple selective risks, and it is shown how to develop new methods. - Two surprising results of the BH procedure in one - sided location tests and permutation tests are revealed. ### Formula representation: - The p - value of the hypothesis test is defined as \( P_i=\Phi^{-1}(X_i) \), where \( X_i \sim N(\theta_i, 1) \) and \( \Phi^{-1} \) is the quantile function of the standard normal distribution. - The decision rule of the BH procedure is: if \( P_i \leq P^* \), where \( P^* = P_{(I^*)} \) and \( I^*=\max\{i : P_{(i)} \leq \frac{i}{m}q\} \), then reject the null hypothesis \( H_i:\theta_i \geq 0 \). - The condition for controlling FDR is: \[ \text{FDR}=E\left[\frac{\sum_{i = 1}^m S_i^*1\{\theta_i \geq 0\}}{1\vee\sum_{i = 1}^m S_i^*}\right]\leq q \] where \( S_i^* = 1\{P_i \leq P^*\} \). Through these methods, the paper provides a systematic framework for dealing with complex problems in selective and post - selection inferences, especially in multiple hypothesis testing and selective confidence interval construction.