Abstract:Much statistical teaching and many research reports focus on the 'null hypothesis significance test'. Yet the correct meaning and interpretation of statistical significance tests is elusive. Misinterpretations are both common and persistent, leading many to question whether significance tests should be used at all. While most take aim at the arbitrary declaration of p < 0.05 as a threshold for determining 'significance', others extend the critique to suggest the 'p-value' should be dispensed with entirely. P-values and significance tests are still widely used as if they give a measure of the size and importance of relationships, even though this misunderstanding has been observed and discussed for many years. We argue that p-values and significance tests are intrinsically misleading. Point estimates of relationships and confidence intervals give direct information about the effect and the uncertainty of the estimate without recourse to interpreting how a particular p-value might have arisen or indeed referring to them at all. In this paper we briefly outline some of the problems with significance testing, offer a number of examples selected from a recent issue of the International Journal of Nursing Studies and discuss some proposed responses to these problems. We conclude by offering some guidance to authors reporting statistical tests in journals and present a position statement that has been adopted by the International Journal of Nursing Studies to guide its' authors in reporting the results of statistical analyses. While stopping short of calling for an outright ban on reporting p-values and significance tests we urge authors (and journals) to place more emphasis on measures of effect and estimates of precision/uncertainty and, following the position of the American Statistical Association, emphasise that authors (and readers) should avoid using 0.05 or any other cut off for a p-value as the basis for a decision about the meaningfulness/importance of an effect. If point estimates and confidence intervals are used, then the p-value may be redundant and can be omitted from reports. When authors talk about 'significance' they need to be explicit when referring to statistical significance and we recommend authors adopt the language of 'importance' when talking about effect sizes to avoid any confusion.

Fair Statistical Communication in HCI

Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise

Effect Sizes and Power Analysis in HCI

Connecting Simple and Precise P-values to Complex and Ambiguous Realities

P value functions: An underused method to present research results and to promote quantitative reasoning

Statistics and explainability: a fruitful alliance

"Medium-n studies" in computing education conferences

Supporting the Design and Analysis of HCI Experiments

Discuss practical importance of results based on interval estimates and p-value functions, not only on point estimates and null p-values

Addressing Common Misuses and Pitfalls of P values in Biomedical Research

Invited Commentary: The Need for Cognitive Science in Methodology

The Practical Alternative to the p Value Is the Correctly Used p Value

Ten Points for High-Quality Statistical Reporting and Data Presentation

Statistical significance testing and p-values: Defending the indefensible? A discussion paper and position statement

Common Statistical Errors in Scientific Investigations: A Simple Guide to Avoid Unfounded Decisions

Are ChatGPT and Copilot Reliable for Health Education on Statistical Testing?

Visualization According to Statisticians: An Interview Study on the Role of Visualization for Inferential Statistics

Key attributes of a modern statistical computing tool

Another Look at Confidence Intervals: Proposal for a More Relevant and Transparent Approach

From significance testing to estimation and Open Science: How esci can help

Optimization Hierarchy for Fair Statistical Decision Problems