Error-rate and decision-theoretic methods of multiple testing: Which genes have high objective probabilities of differential expression?

David R. Bickel
DOI: https://doi.org/10.48550/arXiv.math/0212028
2004-03-30
Abstract:Given a multiple testing situation, the null hypotheses that appear to have sufficiently low probabilities of truth may be rejected using a simple, nonparametric method of decision theory. This applies not only to posterior levels of belief, but also to conditional probabilities in the sense of relative frequencies, as seen from their equality to local false discovery rates (dFDRs). This approach neither requires the estimation of probability densities, nor of their ratios. Decision theory can inform the selection of false discovery rate weights. Decision theory is applied to gene expression microarrays with discussion of the applicability of the assumption of weak dependence.
Probability,Numerical Analysis
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to effectively control the False Discovery Rate (FDR) in multiple hypothesis testing, especially for the analysis of gene expression microarray data. Specifically, the author proposes a decision - theory - based method to select genes with high objective probability of differential expression. ### Main problems 1. **How to control the false discovery rate in multiple hypothesis testing**: - Traditional methods such as the FDR control method proposed by Benjamini and Hochberg (1995) may not be flexible or effective enough in some cases. - The author introduces the "decisive false discovery rate" (dFDR), which is a new error rate measure aiming to optimize multiple hypothesis testing in the decision - theory framework. 2. **How to select genes with high objective probability of differential expression**: - In gene expression microarray data, identifying which genes have significant differential expression is an important problem. - The author proposes a non - parametric method to select those null hypotheses with a sufficiently low probability of being true through decision theory and then reject these hypotheses. ### Solutions - **Definition and properties of dFDR**: \[ D = \begin{cases} \frac{\mathbb{E}\left[\sum_{i = 1}^{m} V_i\right]}{\mathbb{E}\left[\sum_{i = 1}^{m} R_i\right]} & \text{if } \mathbb{P}\left(\sum_{i = 1}^{m} R_i>0\right)>0 \\ 0 & \text{if } \mathbb{P}\left(\sum_{i = 1}^{m} R_i>0\right) = 0 \end{cases} \] where \(V_i\) represents the \(i\)-th false positive discovery, and \(R_i\) represents the \(i\)-th rejected null hypothesis. - **Application of decision theory**: - By introducing the concepts of cost and benefit, maximize the net benefit while controlling dFDR. - Decision theory can be used to select the optimal rejection region \(G\) so that the expected net benefit is maximized. - **Estimation and optimization**: - A method for estimating dFDR based on resampling methods (such as Bootstrap) is proposed. - Optimize dFDR to select the optimal threshold \(t\) or \(P\)-value to ensure maximizing the net benefit under a given cost - benefit ratio. ### Application background This method is especially suitable for the analysis of gene expression microarray data, where the differential expression of a large number of genes needs to be tested simultaneously. By controlling dFDR, researchers can discover as many truly differentially expressed genes as possible while ensuring a certain error rate. In summary, this paper aims to provide a more flexible and effective multiple - hypothesis - testing strategy by introducing dFDR and decision - theory methods, especially suitable for high - throughput data analysis in the biochemical and pharmaceutical fields.