Wenge Guo,Joseph P. Romano
Abstract:When dealing with the problem of simultaneously testing a large number of null hypotheses, a natural testing strategy is to first reduce the number of tested hypotheses by some selection (screening or filtering) process, and then to simultaneously test the selected hypotheses. The main advantage of this strategy is to greatly reduce the severe effect of high dimensions. However, the first screening or selection stage must be properly accounted for in order to maintain some type of error control. In this paper, we will introduce a selection rule based on a selection statistic that is independent of the test statistic when the tested hypothesis is true. Combining this selection rule and the conventional Bonferroni procedure, we can develop a powerful and valid two-stage procedure. The introduced procedure has several nice properties: (i) it completely removes the selection effect; (ii) it reduces the multiplicity effect; (iii) it does not "waste" data while carrying out both selection and testing. Asymptotic power analysis and simulation studies illustrate that this proposed method can provide higher power compared to usual multiple testing methods while controlling the Type 1 error rate. Optimal selection thresholds are also derived based on our asymptotic analysis.
What problem does this paper attempt to address?
This paper attempts to solve the problems encountered when simultaneously testing a large number of hypotheses, especially how to effectively control the Type 1 error rate while increasing the power. Specifically, the paper focuses on reducing the impact of multiple tests in high - dimensional multiple hypothesis testing through a two - stage method while maintaining or improving the efficiency of the test. The main contribution of the paper is to propose a selection rule based on independent screening statistics and, in combination with the traditional Bonferroni procedure, develop an effective two - stage testing procedure.
### Main problems in the paper
1. **Multiple hypothesis testing in high - dimensional data**:
- When a large number of hypotheses need to be tested simultaneously, the standard multiple - testing procedures have low power due to the need to strictly control the Type 1 error rate (such as the family - wise error rate FWER), making it difficult to distinguish between the null hypothesis and the alternative hypothesis.
- To improve the power, a common practice is to first reduce the number of hypotheses to be tested through some selection (screening or filtering) process and then test the screened hypotheses.
2. **Control of selection effects**:
- The screening or selection process in the first stage must be properly considered to maintain some type of error control. Otherwise, if the influence of the screening stage is ignored, it may lead to an out - of - control Type 1 error rate.
- The paper proposes a selection rule based on a selection statistic that is independent of the test statistic when the null hypothesis is true. In this way, error control can be simplified because the conditional distribution and the unconditional distribution are the same.
3. **Improvement of the two - stage method**:
- The paper proposes a two - stage method that combines independent screening and the traditional Bonferroni procedure. This method has the following advantages:
- It completely eliminates the selection effect.
- It reduces the multiple effects.
- It does not "waste" data and performs selection and testing simultaneously.
- Through asymptotic analysis and simulation studies, it is proved that this method has higher power compared with traditional multiple - testing methods while controlling the Type 1 error rate.
### Specific content of the paper
- **Model setup**:
- Assume that samples are drawn from independent normal populations, and the mean \(\mu_i\) and variance \(\sigma_i^2\) of each population are unknown.
- Define two statistics \(S_{n,i}\) and \(T_{n,i}\), which are used for selection and hypothesis testing respectively.
- \(S_{n,i}=\sum_{j = 1}^nX_{i,j}^2\), \(T_{n,i}=\sqrt{n}\frac{\bar{X}_{n,i}}{\hat{\sigma}_{n,i}}\), where \(\bar{X}_{n,i}\) is the sample mean of the \(i\) - th sample, and \(\hat{\sigma}_{n,i}\) is the unbiased sample variance of the \(i\) - th sample.
- **Two - stage strategy**:
- First stage: Use the \(S_{n,i}\) statistic to select which hypotheses enter the second - stage test. Given a threshold \(u\), if \(S_{n,i}\geq u\), then select hypothesis \(H_i\).
- Second stage: Use the \(T_{n,i}\) statistic to test the selected hypotheses. If \(\vert T_{n,i}\vert\geq t_{n - 1}(1-\frac{\alpha}{2\vert\hat{S}_n\vert})\), then reject \(H_i\).
- **Error control**:
- Through Basu's theorem, when the null hypothesis is true, \(S_{n,i}\) and \(T_{n,i}\) are independent, so it can ensure that the power is increased while controlling the Type 1 error rate.
- Through asymptotic analysis, the optimal form of the selection threshold \(u\) is determined, so that the power is maximized while controlling the Type 1 error rate.
- **Extensions and improvements**:
- Discuss the effectiveness of the two - stage method in the case of unknown variances and dependencies.
- Propose a step - by - step improvement based on the Holm method, which further improves the power.
In summary, this paper...