Abstract:$P$-values that are derived from continuously distributed test statistics are typically uniformly distributed on $(0,1)$ under least favorable parameter configurations (LFCs) in the null hypothesis. Conservativeness of a $p$-value $P$ (meaning that $P$ is under the null hypothesis stochastically larger than a random variable which is uniformly distributed on $(0,1)$) can occur if the test statistic from which $P$ is derived is discrete, or if the true parameter value under the null is not an LFC. To deal with both of these sources of conservativeness, we present two approaches utilizing randomized $p$-values, namely single-stage and two-stage randomization. We illustrate their effectiveness for testing a composite null hypothesis under a binomial model. We also give an example of how the proposed $p$-values can be used to test a composite null in group testing designs. Similar to previous findings, we find that the proposed randomized $p$-values are less conservative compared to non-randomized $p$-values under the null hypothesis, but that they are stochastically not smaller under the alternative. The problem of establishing the validity of randomized $p$-values is not trivial and has received attention in previous literature. We show that our proposed randomized $p$-values are valid under various discrete statistical models which are such that the distribution of the corresponding test statistic belongs to an exponential family. The behaviour of the power function for the tests based on the proposed randomized $p$-values as a function of the sample size is also investigated. Simulations and a real data analysis are used to compare the different considered $p$-values.
What problem does this paper attempt to address?
This paper attempts to solve the problem of p - value conservatism in multiple tests of composite null hypotheses for discrete data. Specifically:
1. **Non - uniformity of p - values caused by discrete data**: When the test statistic has a discrete distribution, the resulting p - values are usually also discrete, rather than following a uniform distribution on (0, 1) under the null hypothesis. This non - uniformity can lead to overly conservative p - values (i.e., p - values are larger under the null hypothesis than in the standard uniform distribution).
2. **Non - uniformity of p - values under the composite null hypothesis**: Even if the test statistic is continuous, under the composite null hypothesis (i.e., the null hypothesis contains multiple parameter values), if the true parameter value is not the least favorable configuration (LFC), it will also lead to conservatism of p - values. The LFC is usually located on the boundary of the null hypothesis, so if the true parameter value is "deeply hidden" inside the null hypothesis, the p - values will be more non - uniform.
To solve these problems, the authors propose two methods based on randomized p - values:
- **Single - stage randomization**: By introducing a random variable to adjust the discrete p - values, making them closer to a uniform distribution.
- **Two - stage randomization**: First, use the first - stage randomization method to convert the discrete p - values into continuous p - values, and then further deal with the conservatism caused by the composite null hypothesis in the second stage.
These methods aim to make the p - values closer to a uniform distribution under the null hypothesis and maintain reasonable test power under the alternative hypothesis. Through simulation and actual data analysis, the authors demonstrate the effectiveness of these randomized p - values and prove their effectiveness in multiple discrete statistical models.
### Formula summary
- **LFC - based p - value**:
\[
P_{\text{LFC}}(\mathbf{X}) = 1 - F_{\theta^*}(T(\mathbf{X})^-)
\]
where \( F_{\theta^*} \) is the cumulative distribution function (CDF) of the test statistic \( T \) under the LFC parameter \(\theta^*\), and \( f(x^-) := \lim_{y \uparrow x} f(y) \).
- **Single - stage randomized p - value**:
\[
P_{\text{rand1}}(\mathbf{X}, U, c) = U \cdot 1_{\{P_{\text{LFC}}(\mathbf{X}) \geq c\}} + \frac{P_{\text{LFC}}(\mathbf{X})}{c^*} \cdot 1_{\{P_{\text{LFC}}(\mathbf{X}) < c\}}
\]
where \( c^* = P_{\theta^*}\{P_{\text{LFC}}(\mathbf{X}) < c\}\), \( U \sim \text{UNI}[0, 1] \).
- **Two - stage randomized p - value**:
\[
P_{\text{rand2}}(\mathbf{X}, U, \tilde{U}, c) = \tilde{U} \cdot 1_{\{P_{\text{rand}}^T(\mathbf{X}, U) \geq c\}} + P_{\text{rand}}^T(\mathbf{X}, U) \cdot 1_{\{P_{\text{rand}}^T(\mathbf{X}, U) < c\}}
\]
where \( P_{\text{rand}}^T(\mathbf{X}, U) = \sum_{\mathbf{y}: T(\mathbf{y}) > T(\mathbf{X})} f_{\theta^*}(\mathbf{y}) + U \sum_{\mathbf{y}: T(\mathbf{y}) = T(\mathbf{X})} f_{\theta^*}(\mathbf{y}) \), \( \tilde{U} \sim \text{UNI}[0, 1] \).
Through these methods, the paper aims to improve...