False Discovery Control in Multiple Testing: A Brief Overview of Theories and Methodologies

Jianliang He,Bowen Gang,Luella Fu
2024-11-16
Abstract:As the volume and complexity of data continue to expand across various scientific disciplines, the need for robust methods to account for the multiplicity of comparisons has grown widespread. A popular measure of type 1 error rate in multiple testing literature is the false discovery rate (FDR). The FDR provides a powerful and practical approach to large-scale multiple testing and has been successfully used in a wide range of applications. The concept of FDR has gained wide acceptance in the statistical community and various methods has been proposed to control the FDR. In this work, we review the latest developments in FDR control methodologies. We also develop a conceptual framework to better describe this vast literature; understand its intuition and key ideas; and provide guidance for the researcher interested in both the application and development of the methodology.
Methodology,Statistics Theory,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in large - scale multiple hypothesis testing, how to effectively control the False Discovery Rate (FDR). With the continuous increase in the amount and complexity of data, the multiple comparison problem has become more and more common, which may lead to non - reproducibility of results, publication bias, and the p - hacking phenomenon in scientific research. Therefore, researchers need powerful methods to deal with this multiplicity problem. Specifically, this paper mainly focuses on the following points: 1. **Definition and Importance of FDR**: - FDR refers to the expected value of the proportion of false discoveries among all discoveries, that is: \[ \text{FDR}(R)=E\left[\frac{|R \cap H_0|}{|R| \vee 1}\right] \] where \(R\) is the set of rejected hypotheses and \(H_0\) is the set of all null hypotheses. 2. **Methodological Development of FDR Control**: - The paper reviews the latest FDR control methods and proposes a general framework to describe these methods, helping researchers understand the intuition and key ideas behind them. - It introduces and discusses classic methods such as the Benjamini - Hochberg (BH) procedure and its variants, the Sun - Cai (SC) procedure and its variants. 3. **FDR Control under Dependence Structures**: - In practical applications, the dependence relationships between hypothesis tests often exist, which poses a challenge to FDR control. The paper explores FDR control methods under different dependence structures, including positive regression dependence sets (PRDS), weak dependence, factor models, etc. 4. **Utilization of Auxiliary Information**: - Researchers can often use additional information (such as covariates) to improve FDR control methods. The paper introduces how to integrate this auxiliary information into the FDR control process to improve the detection ability. 5. **Application of e - value**: - e - value is a non - negative random variable that satisfies \(E[E] \leq 1\) under the null hypothesis. The paper discusses the application of e - value in FDR control, especially its effectiveness under arbitrary dependence conditions. In summary, this paper aims to provide researchers with a comprehensive review of FDR control methods and propose new insights and frameworks to deal with the increasingly complex multiple hypothesis testing problems in modern scientific research.