Confidence distributions and hypothesis testing

Eugenio Melilli,Piero Veronese

DOI: https://doi.org/10.1007/s00362-024-01542-4

2024-03-29

Statistical Papers

Abstract:Abstract The traditional frequentist approach to hypothesis testing has recently come under extensive debate, raising several critical concerns. Additionally, practical applications often blend the decision-theoretical framework pioneered by Neyman and Pearson with the inductive inferential process relied on the p -value, as advocated by Fisher. The combination of the two methods has led to interpreting the p -value as both an observed error rate and a measure of empirical evidence for the hypothesis. Unfortunately, both interpretations pose difficulties. In this context, we propose that resorting to confidence distributions can offer a valuable solution to address many of these critical issues. Rather than suggesting an automatic procedure, we present a natural approach to tackle the problem within a broader inferential context. Through the use of confidence distributions, we show the possibility of defining two statistical measures of evidence that align with different types of hypotheses under examination. These measures, unlike the p -value, exhibit coherence, simplicity of interpretation, and ease of computation, as exemplified by various illustrative examples spanning diverse fields. Furthermore, we provide theoretical results that establish connections between our proposal, other measures of evidence given in the literature, and standard testing concepts such as size, optimality, and the p -value.

statistics & probability

What problem does this paper attempt to address?

The paper attempts to address some key issues present in traditional frequentist hypothesis testing methods. Specifically, these issues include: 1. **Multiple interpretations of p-values**: p-values are widely used in practice to quantify the degree of support for the null hypothesis, but their interpretation is controversial. On one hand, p-values are seen as the observed error rate; on the other hand, they are viewed as empirical evidence for the hypothesis. Both interpretations have problems. 2. **Confusion between the decision-theoretic framework and the inductive reasoning process**: Traditional hypothesis testing methods combine Neyman-Pearson's decision-theoretic framework with Fisher's inductive reasoning process. This combination leads to misunderstandings of p-values. 3. **Inconsistency of p-values**: p-values lack consistency and ease of interpretation when used as a measure of statistical evidence. For example, Schervish (1996) pointed out that for the same observed data, p-values may show inconsistency when testing different hypotheses. To address these issues, the author proposes using **Confidence Distributions (CD)** as a valuable solution. By using confidence distributions, two measures of statistical evidence can be defined, which have the following characteristics: - **Consistency**: Compared to p-values, these measures perform more consistently under different types of hypotheses. - **Ease of interpretation**: These measures are easier to understand and interpret. - **Ease of computation**: These measures are relatively simple to compute. The author demonstrates the application of these measures in different fields through multiple examples and provides theoretical results, establishing the connection between these measures and other evidence measures in the literature, as well as standard hypothesis testing concepts (such as significance level, optimality, and p-values).

Confidence distributions and hypothesis testing

P values, confidence intervals, or confidence levels for hypotheses?

Confidences in Hypotheses

p-Value as the Strength of Evidence Measured by Confidence Distribution

An Entropy-Based Approach for Nonparametrically Testing Simple Probability Distribution Hypotheses

Post-hoc Hypothesis Testing

On Some Assumptions of the Null Hypothesis Statistical Testing

Dempster-Shafer P-values: Thoughts on an Alternative Approach for Multinomial Inference

P-value: A Bless or A Curse for Evidence-Based Studies?

Hypothesis Testing in Econometrics

Towards a theory for testing statistical hypothesis: Multivariate mean with nuisance covariance matrix

Beyond Neyman-Pearson: E-values enable hypothesis testing with a data-driven alpha

Interval estimation, point estimation, and null hypothesis significance testing calibrated by an estimated posterior probability of the null hypothesis

Multiple testing of composite null hypotheses for discrete data using randomized $p$-values

Sequential Tests of Statistical Hypotheses with Confidence Limits

Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise

Safe Testing

Statistical significance testing for mixed priors: a combined Bayesian and frequentist analysis

A frequentist two-sample test based on Bayesian model selection

Higher Accuracy for Bayesian and Frequentist Inference: Large Sample Theory for Small Sample Likelihood

Hypothesis Testing with Finite Statistics