Abstract:Much statistical teaching and many research reports focus on the 'null hypothesis significance test'. Yet the correct meaning and interpretation of statistical significance tests is elusive. Misinterpretations are both common and persistent, leading many to question whether significance tests should be used at all. While most take aim at the arbitrary declaration of p < 0.05 as a threshold for determining 'significance', others extend the critique to suggest the 'p-value' should be dispensed with entirely. P-values and significance tests are still widely used as if they give a measure of the size and importance of relationships, even though this misunderstanding has been observed and discussed for many years. We argue that p-values and significance tests are intrinsically misleading. Point estimates of relationships and confidence intervals give direct information about the effect and the uncertainty of the estimate without recourse to interpreting how a particular p-value might have arisen or indeed referring to them at all. In this paper we briefly outline some of the problems with significance testing, offer a number of examples selected from a recent issue of the International Journal of Nursing Studies and discuss some proposed responses to these problems. We conclude by offering some guidance to authors reporting statistical tests in journals and present a position statement that has been adopted by the International Journal of Nursing Studies to guide its' authors in reporting the results of statistical analyses. While stopping short of calling for an outright ban on reporting p-values and significance tests we urge authors (and journals) to place more emphasis on measures of effect and estimates of precision/uncertainty and, following the position of the American Statistical Association, emphasise that authors (and readers) should avoid using 0.05 or any other cut off for a p-value as the basis for a decision about the meaningfulness/importance of an effect. If point estimates and confidence intervals are used, then the p-value may be redundant and can be omitted from reports. When authors talk about 'significance' they need to be explicit when referring to statistical significance and we recommend authors adopt the language of 'importance' when talking about effect sizes to avoid any confusion.

"Medium-n studies" in computing education conferences

Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise

Statistical significance testing and p-values: Defending the indefensible? A discussion paper and position statement

A simple statistical framework for small sample studies

From significance testing to estimation and Open Science: How esci can help

Fair Statistical Communication in HCI

Too Big to Fail: Larger Samples and False Discoveries

Connecting Simple and Precise P-values to Complex and Ambiguous Realities

The power and pitfalls of underpowered studies

The Search for Significance: A Few Peculiarities in the Distribution of P Values in Experimental Psychology Literature

Discuss practical importance of results based on interval estimates and p-value functions, not only on point estimates and null p-values

Research Commentary - Too Big to Fail: Large Samples and the p-Value Problem.

When More Is Less: Pitfalls of significance testing

Inferential Statistics in Computing Education Research: A Methodological Review

Invited Commentary: The Need for Cognitive Science in Methodology

When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment

Testing in the Presence of Nuisance Parameters: Some Comments on Tests Post-Model-Selection and Random Critical Values

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Simple solution to a common statistical problem: Interpreting multiple tests

Testing Theories with Big Data : A SuperPower Approach

The Practical Alternative to the p Value Is the Correctly Used p Value