Abstract:Many modern applications require the use of data to both select the statistical tasks and make valid inference after selection. In this article, we provide a unifying approach to control for a class of selective risks. Our method is motivated by a reformulation of the celebrated Benjamini-Hochberg (BH) procedure for multiple hypothesis testing as the iterative limit of the Benjamini-Yekutieli (BY) procedure for constructing post-selection confidence intervals. Although several earlier authors have made noteworthy observations related to this, our discussion highlights that (1) the BH procedure is precisely the fixed-point iteration of the BY procedure; (2) the fact that the BH procedure controls the false discovery rate is almost an immediate corollary of the fact that the BY procedure controls the false coverage-statement rate. Building on this observation, we propose a constructive approach to control extra-selection risk (selection made after decision) by iterating decision strategies that control the post-selection risk (decision made after selection), and show that many previous methods and results are special cases of this general framework. We further extend this approach to problems with multiple selective risks and demonstrate how new methods can be developed. Our development leads to two surprising results about the BH procedure: (1) in the context of one-sided location testing, the BH procedure not only controls the false discovery rate at the null but also at other locations for free; (2) in the context of permutation tests, the BH procedure with exact permutation p-values can be well approximated by a procedure which only requires a total number of permutations that is almost linear in the total number of hypotheses.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in modern applications, how to use data to select statistical tasks and make effective inferences after selection. Specifically, the article proposes a unified method to control a class of selective risks. Selective risks refer to the risks that may occur when making statistical inferences after selecting certain hypotheses or models according to data during the data analysis process. ### Main problem decomposition: 1. **Selection problem**: - How to select potential signals from a large number of candidate signals for further research? - For example, in multiple hypothesis testing, the goal is to reject as many hypotheses as possible while controlling a certain multi - hypothesis testing error rate (such as the false discovery rate FDR). 2. **Inference problem after selection**: - How to make statistical inferences after using data to select some tasks? - A typical example is to construct confidence intervals for parameters selected using data. This problem has achieved many methodological breakthroughs in the past two decades. ### Research background and motivation: In classical statistical theory, it is usually assumed that scientific hypotheses and statistical models have been determined before data analysis. However, in practice, data analysts often hope to use data to select appropriate models or hypotheses for testing. This selection process will cause traditional inference procedures to fail, so a large amount of literature on "selective inference" has been generated. ### Core contributions of the paper: 1. **New interpretation of the BH procedure**: - The paper reinterprets the Benjamini - Hochberg (BH) procedure as the iterative limit of the Benjamini - Yekutieli (BY) procedure. - The BH procedure is used to control the false discovery rate (FDR) in the selection problem, while the BY procedure is used to control the false coverage statement rate (FCR) in the post - selection problem. 2. **Constructive method**: - A method is proposed to control the additional selection risk (selection made after decision) by iteratively controlling the post - selection risk (decision made after selection). - It is proved that many previous methods and results are special cases of this general framework. 3. **Extension and application**: - This method is extended to problems with multiple selective risks, and it is shown how to develop new methods. - Two surprising results of the BH procedure in one - sided location tests and permutation tests are revealed. ### Formula representation: - The p - value of the hypothesis test is defined as \( P_i=\Phi^{-1}(X_i) \), where \( X_i \sim N(\theta_i, 1) \) and \( \Phi^{-1} \) is the quantile function of the standard normal distribution. - The decision rule of the BH procedure is: if \( P_i \leq P^* \), where \( P^* = P_{(I^*)} \) and \( I^*=\max\{i : P_{(i)} \leq \frac{i}{m}q\} \), then reject the null hypothesis \( H_i:\theta_i \geq 0 \). - The condition for controlling FDR is: \[ \text{FDR}=E\left[\frac{\sum_{i = 1}^m S_i^*1\{\theta_i \geq 0\}}{1\vee\sum_{i = 1}^m S_i^*}\right]\leq q \] where \( S_i^* = 1\{P_i \leq P^*\} \). Through these methods, the paper provides a systematic framework for dealing with complex problems in selective and post - selection inferences, especially in multiple hypothesis testing and selective confidence interval construction.

A constructive approach to selective risk control

Analysis of error control in large scale two-stage multiple hypothesis testing

Factors in a chloroplast extract specifically bind to the 5' untranslated regions of chloroplast mRNAs.

Selective conformal inference with false coverage-statement rate control

On stepdown control of the false discovery proportion

Optimal False Discovery Rate Control for Large Scale Multiple Testing with Auxiliary Information

Asymptotic false discovery control of the Benjamini-Hochberg procedure for pairwise comparisons

Selective Randomization Inference for Adaptive Experiments

A more practical approach for the Benjamini-Hochberg FDR controlling procedure for huge-scale testing problems

A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence

Optimal control of false discovery criteria in the two-group model

Online control of the false discovery rate in biomedical research

Only Closed Testing Procedures are Admissible for Controlling False Discovery Proportions

False Discovery Control in Multiple Testing: A Brief Overview of Theories and Methodologies

Sequential tests of multiple hypotheses controlling false discovery and nondiscovery rates

Power-enhanced multiple decision functions controlling family-wise error and false discovery rates

A New Procedure for Controlling False Discovery Rate in Large-Scale t-tests

Selective testing and its effect on false discovery rate controlling procedures under discrete framework

Estimating the proportion of true null hypotheses and adaptive false discovery rate control in discrete paradigm

Simultaneous high-probability bounds on the false discovery proportion in structured, regression, and online settings

A practical guide to methods controlling false discoveries in computational biology