Flexible control of the median of the false discovery proportion

Jesse Hemerik,Aldo Solari,Jelle J Goeman
DOI: https://doi.org/10.1093/biomet/asae018
IF: 3.0279
2024-03-23
Biometrika
Abstract:Abstract We introduce a multiple testing procedure that controls the median of the proportion of false discoveries in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini–Hochberg method, which controls the mean of the proportion of false discoveries. Our method allows free choice of one or several values of alpha after seeing the data, unlike the Benjamini–Hochberg procedure, which can be very anti-conservative when alpha is chosen post hoc. We prove these claims and illustrate them with simulations. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the proportion of false discoveries, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values.
statistics & probability,mathematical & computational biology,biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to control the median of the False Discovery Proportion (FDP) in multiple hypothesis testing, rather than its mean. The traditional Benjamini - Hochberg method controls the mean of FDP (i.e., the False Discovery Rate, FDR), but this method has great limitations in the selection of α values. Especially when the α value is selected after data observation, it may cause the method to become very conservative or inaccurate. Therefore, this paper proposes a new multiple - testing procedure. This procedure can freely select one or more α values after observing the data and can provide a simultaneously valid unbiased estimator of the median of FDP. Specifically, the main contributions of the paper include: 1. **Flexible FDP Median Control**: A new multiple - testing method is proposed. This method allows the free selection of α values after observing the data without losing the effectiveness of the method. This is different from the traditional Benjamini - Hochberg method, which may become very conservative when the α value is selected after data observation. 2. **Non - Asymptotic Method**: This method only needs to input the p - value vector and is non - asymptotic, which is suitable for the finite - sample situation. 3. **Median Unbiased Estimation**: By adapting the existing estimator of the proportion of true hypotheses (π0), an unbiased estimator of the median of FDP is provided, ensuring that the FDP does not exceed the set threshold γ with a probability of 50%. 4. **Simultaneity Guarantee**: This method provides a simultaneously valid 50% confidence upper bound, which means that multiple α values can be selected after observing the data and these selections are all valid. 5. **No Independence Assumption Required**: This method does not require the p - values to be independent of each other and is suitable for scenarios where there is correlation. 6. **Low Time Complexity**: The time complexity of this method is linear, and the computational efficiency is high after sorting the p - values. In general, the paper aims to provide a more flexible and powerful multiple - testing method to overcome the limitations of existing methods in the selection of α values, especially the need to adjust α values after data observation.