The assessment of replicability using the sum of p-values

Leonhard Held,Samuel Pawel,Charlotte Micheloud
DOI: https://doi.org/10.1098/rsos.240149
IF: 3.5
2024-08-30
Royal Society Open Science
Abstract:Statistical significance of both the original and the replication study is a commonly used criterion to assess replication attempts, also known as the two-trials rule in drug development. However, replication studies are sometimes conducted although the original study is non-significant, in which case Type-I error rate control across both studies is no longer guaranteed. We propose an alternative method to assess replicability using the sum of p -values from the two studies. The approach provides a combined p -value and can be calibrated to control the overall Type-I error rate at the same level as the two-trials rule but allows for replication success even if the original study is non-significant. The unweighted version requires a less restrictive level of significance at replication if the original study is already convincing which facilitates sample size reductions of up to 10%. Downweighting the original study accounts for possible bias and requires a more stringent significance level and larger sample sizes at replication. Data from four large-scale replication projects are used to illustrate and compare the proposed method with the two-trials rule, meta-analysis and Fisher’s combination method.
multidisciplinary sciences
What problem does this paper attempt to address?