Confidence distributions and hypothesis testing

Eugenio Melilli,Piero Veronese
DOI: https://doi.org/10.1007/s00362-024-01542-4
2024-03-29
Statistical Papers
Abstract:Abstract The traditional frequentist approach to hypothesis testing has recently come under extensive debate, raising several critical concerns. Additionally, practical applications often blend the decision-theoretical framework pioneered by Neyman and Pearson with the inductive inferential process relied on the p -value, as advocated by Fisher. The combination of the two methods has led to interpreting the p -value as both an observed error rate and a measure of empirical evidence for the hypothesis. Unfortunately, both interpretations pose difficulties. In this context, we propose that resorting to confidence distributions can offer a valuable solution to address many of these critical issues. Rather than suggesting an automatic procedure, we present a natural approach to tackle the problem within a broader inferential context. Through the use of confidence distributions, we show the possibility of defining two statistical measures of evidence that align with different types of hypotheses under examination. These measures, unlike the p -value, exhibit coherence, simplicity of interpretation, and ease of computation, as exemplified by various illustrative examples spanning diverse fields. Furthermore, we provide theoretical results that establish connections between our proposal, other measures of evidence given in the literature, and standard testing concepts such as size, optimality, and the p -value.
statistics & probability
What problem does this paper attempt to address?
The paper attempts to address some key issues present in traditional frequentist hypothesis testing methods. Specifically, these issues include: 1. **Multiple interpretations of p-values**: p-values are widely used in practice to quantify the degree of support for the null hypothesis, but their interpretation is controversial. On one hand, p-values are seen as the observed error rate; on the other hand, they are viewed as empirical evidence for the hypothesis. Both interpretations have problems. 2. **Confusion between the decision-theoretic framework and the inductive reasoning process**: Traditional hypothesis testing methods combine Neyman-Pearson's decision-theoretic framework with Fisher's inductive reasoning process. This combination leads to misunderstandings of p-values. 3. **Inconsistency of p-values**: p-values lack consistency and ease of interpretation when used as a measure of statistical evidence. For example, Schervish (1996) pointed out that for the same observed data, p-values may show inconsistency when testing different hypotheses. To address these issues, the author proposes using **Confidence Distributions (CD)** as a valuable solution. By using confidence distributions, two measures of statistical evidence can be defined, which have the following characteristics: - **Consistency**: Compared to p-values, these measures perform more consistently under different types of hypotheses. - **Ease of interpretation**: These measures are easier to understand and interpret. - **Ease of computation**: These measures are relatively simple to compute. The author demonstrates the application of these measures in different fields through multiple examples and provides theoretical results, establishing the connection between these measures and other evidence measures in the literature, as well as standard hypothesis testing concepts (such as significance level, optimality, and p-values).