Abstract:The paper introduces a \(p\)-value that summarizes the evidence against a rival causal theory that explains an observed outcome in a single case. We show how to represent the probability distribution characterizing a theorized rival hypothesis (the null) in the absence of randomization of treatment and when counting on qualitative data, for instance when conducting process tracing. As in Fisher's \autocite*{fisher1935design} original design, our \(p\)-value indicates how frequently one would find the same observations or even more favorable observations under a theory that is compatible with our observations but antagonistic to the working hypothesis. We also present an extension that allows researchers assess the sensitivity of their results to confirmation bias. Finally, we illustrate the application of our hypothesis test using the study by Snow \autocite*{Snow1855} about the cause of Cholera in Soho, a classic in Process Tracing, Epidemiology, and Microbiology. Our framework suits any type of case studies and evidence, such as data from interviews, archives, or participant observation.

What problem does this paper attempt to address?

The paper aims to address the issue of how to use p-values to evaluate theoretical causal explanations in single-case studies, especially in the absence of randomized experiments. The author proposes a method for summarizing evidence against competing causal theories that explain observed results (referred to as the "null hypothesis"). This method does not require randomized treatment and is applicable to studies that rely on qualitative data, such as Process Tracing. The main contribution of the paper is that it not only demonstrates how to represent the probability distribution of competing hypotheses in the absence of randomization and reliance on qualitative data such as interviews, archives, or participant observation, but also provides a conservative p-value to measure the degree of evidence supporting the working hypothesis. In addition, the paper proposes a sensitivity analysis to assess the sensitivity of the research findings to confirmation bias. The paper illustrates the practicality of the proposed hypothesis testing framework by applying it to Snow's (1855) classic study on the causes of cholera. Snow's theory suggested that cholera was caused by contaminated water, and his strongest evidence came from a series of interviews, public registers, and a map. However, his conclusions were once questioned due to the lack of available tests to refute the popular miasma theory at the time. Structurally, the paper begins with an introduction that outlines the research motivation and background, followed by the second part which defines the main concepts in causal inference and sets the problem of inferring unobserved counterfactuals in single-case studies. The third part introduces a null model for generating null hypotheses to assess the strength of evidence between the working hypothesis and competing explanations. Finally, the fourth part discusses how to assess the sensitivity of the research to observation bias, introducing a biased null model of the non-central hypergeometric distribution, which allows some types of evidence to be more easily obtained than others, thus better reflecting the conditions of real-world research.

A p-value for Process Tracing and other N=1 Studies

P-value: A Bless or A Curse for Evidence-Based Studies?

p-Value as the Strength of Evidence Measured by Confidence Distribution

Dempster-Shafer P-values: Thoughts on an Alternative Approach for Multinomial Inference

High-Dimensional Randomized Crossover Studies: A Clarification of P-Values Interpretation

Thou Shalt Not Reject the P-value

P value functions: An underused method to present research results and to promote quantitative reasoning

Multiple testing of composite null hypotheses for discrete data using randomized $p$-values

p-Values for Credibility

Selective inference is easier with p-values

Understanding p-values and significance

Post-hoc Hypothesis Testing

The evidence contained in the P-value is context dependent

Valid p-Values and Expectations of p-Values Revisited

Randomized p-values for multiple testing and their application in replicability analysis

A Likelihood-based Alternative to Null Hypothesis Significance Testing

P values, confidence intervals, or confidence levels for hypotheses?

The Practical Alternative to the p Value Is the Correctly Used p Value

Smaller $p$-values via indirect information

Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses

Confidence distributions and hypothesis testing