Abstract:Much statistical teaching and many research reports focus on the 'null hypothesis significance test'. Yet the correct meaning and interpretation of statistical significance tests is elusive. Misinterpretations are both common and persistent, leading many to question whether significance tests should be used at all. While most take aim at the arbitrary declaration of p < 0.05 as a threshold for determining 'significance', others extend the critique to suggest the 'p-value' should be dispensed with entirely. P-values and significance tests are still widely used as if they give a measure of the size and importance of relationships, even though this misunderstanding has been observed and discussed for many years. We argue that p-values and significance tests are intrinsically misleading. Point estimates of relationships and confidence intervals give direct information about the effect and the uncertainty of the estimate without recourse to interpreting how a particular p-value might have arisen or indeed referring to them at all. In this paper we briefly outline some of the problems with significance testing, offer a number of examples selected from a recent issue of the International Journal of Nursing Studies and discuss some proposed responses to these problems. We conclude by offering some guidance to authors reporting statistical tests in journals and present a position statement that has been adopted by the International Journal of Nursing Studies to guide its' authors in reporting the results of statistical analyses. While stopping short of calling for an outright ban on reporting p-values and significance tests we urge authors (and journals) to place more emphasis on measures of effect and estimates of precision/uncertainty and, following the position of the American Statistical Association, emphasise that authors (and readers) should avoid using 0.05 or any other cut off for a p-value as the basis for a decision about the meaningfulness/importance of an effect. If point estimates and confidence intervals are used, then the p-value may be redundant and can be omitted from reports. When authors talk about 'significance' they need to be explicit when referring to statistical significance and we recommend authors adopt the language of 'importance' when talking about effect sizes to avoid any confusion.

Understanding p-values and significance

Thou Shalt Not Reject the P-value

Statistical significance testing and p-values: Defending the indefensible? A discussion paper and position statement

Alternatives to the P value: connotations of significance

High-Dimensional Randomized Crossover Studies: A Clarification of P-Values Interpretation

Why not to (over)emphasize statistical significance.

The Search for Significance: A Few Peculiarities in the Distribution of P Values in Experimental Psychology Literature

New relevance and significance measures to replace p-values

Addressing Common Misuses and Pitfalls of P values in Biomedical Research

About statistical significance, and the lack thereof

Reporting and interpreting non-significant results in animal cognition research

Evidence-based medicine or statistically manipulated medicine? Are we slaves to the P -value?

Study design: think 'scientific value' not 'p-values'

[Statistical P values do not dominate scientific research].

P-value: A Bless or A Curse for Evidence-Based Studies?

P-value, compatibility, and S-value

Reexamining Statistical Significance and P-Values in Nursing Research: Historical Context and Guidance for Interpretation, Alternatives, and Reporting

When More Is Less: Pitfalls of significance testing

Discuss practical importance of results based on interval estimates and p-value functions, not only on point estimates and null p-values

Valid p-Values and Expectations of p-Values Revisited

A Likelihood-based Alternative to Null Hypothesis Significance Testing