Abstract:Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and $P$-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review simple aids to statistical interpretations. These aids emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. We use the Shannon transform of the $P$-value $p$, also known as the binary surprisal or $S$-value $s=-\log_{2}(p)$, to measure the information supplied by the testing procedure, and to help calibrate intuitions against simple physical experiments like coin tossing. We also use tables or graphs of test statistics for alternative hypotheses, and interval estimates for different percentile levels, to thwart fallacies arising from arbitrary dichotomies. Finally, we reinterpret $P$-values and interval estimates in unconditional terms, which describe compatibility of data with the entire set of analysis assumptions. We illustrate these methods with a reanalysis of data from an existing record-based cohort study. In line with other recent recommendations, we advise that teaching materials and research reports discuss $P$-values as measures of compatibility rather than significance, compute $P$-values for alternative hypotheses whenever they are computed for null hypotheses, and interpret interval estimates as showing values of high compatibility with data, rather than regions of confidence. Our recommendations emphasize cognitive devices for displaying the compatibility of the observed data with various hypotheses of interest, rather than focusing on single hypothesis tests or interval estimates. We believe these simple reforms are well worth the minor effort they require.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the misinterpretation and misuse of traditional statistical outputs (such as p - values and confidence intervals) in statistics. Specifically, the author believes that these problems are not just technical difficulties, but more cognitive challenges. Therefore, the paper proposes several simple methods to help researchers interpret statistical results more accurately. These methods emphasize logic and information concepts rather than probability, aiming to reduce common misunderstandings. ### Background and Core Issues of the Paper 1. **Background**: - Researchers often misread and misuse statistical outputs, leading to a large number of literatures discussing how to modify or replace traditional hypothesis - testing thresholds and p - values. - Since the core problem is more cognitive than statistical, traditional solutions are often of limited effectiveness. 2. **Core Issues**: - Traditional terms such as "significance", "non - significance" and "confidence interval" are prone to cause misunderstandings. - Researchers tend to take extra leaps and shortcuts, so it is necessary to predict the meaning of terms and interpretations to improve practice. - Although some journals strongly opposed reporting p - values many years ago, misunderstandings about statistical significance still exist. ### Solutions 1. **Using Compatibility and Surprise Values**: - The author suggests using the Shannon - transformed p - value, that is, the binary surprise value (s = - log2(p)), to provide the amount of information provided by the test procedure and help calibrate intuition. - Display the test statistics under different hypotheses and the interval estimates at different percentile levels through tables or graphs to avoid fallacies caused by arbitrary dichotomies. 2. **Reinterpreting p - Values and Interval Estimates**: - Unconditionally describe p - values and interval estimates as the compatibility of data with the entire set of analysis hypotheses, rather than only focusing on a single hypothesis test or interval estimate. 3. **Suggestions for Improving Teaching Materials and Research Reports**: - Teaching materials and research reports should describe p - values as a measure of compatibility, not just significance. - Calculate the p - values of alternative hypotheses, not just the null hypothesis. - Interpret interval estimates as values highly compatible with data, rather than confidence intervals. ### Example Applications The paper demonstrates the application of these methods through an example of a record - based cohort study. Specifically, the author analyzes the association between the use of selective serotonin reuptake inhibitors (SSRIs) during pregnancy and autism spectrum disorder (ASD) in offspring. By calculating p - values and surprise values under different hypotheses, the author shows how to interpret these results more accurately and avoid common misunderstandings. ### Conclusion The author believes that although these simple reforms require some effort, they are very worthwhile. Through these methods, the misreading of traditional statistical outputs can be reduced, and the accuracy and reliability of scientific research can be improved.

Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise

Connecting Simple and Precise P-values to Complex and Ambiguous Realities

P-value, compatibility, and S-value

Fair Statistical Communication in HCI

Invited Commentary: The Need for Cognitive Science in Methodology

Another Look at Confidence Intervals: Proposal for a More Relevant and Transparent Approach

Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

P value functions: An underused method to present research results and to promote quantitative reasoning

Thou Shalt Not Reject the P-value

From significance testing to estimation and Open Science: How esci can help

Simple solution to a common statistical problem: Interpreting multiple tests

Beyond Psychology: Prevalence of P Value and Confidence Interval Misinterpretation Across Different Fields

Statistical significance testing and p-values: Defending the indefensible? A discussion paper and position statement

Addressing Common Misuses and Pitfalls of P values in Biomedical Research

Beyond 'statistical significance': A nontechnical primer of Bayesian statistics and Bayes factors for health researchers

Confidence distributions and hypothesis testing

The Practical Alternative to the p Value Is the Correctly Used p Value

Valid p-Values and Expectations of p-Values Revisited

Changing the paradigm of fixed significance levels: Testing Hypothesis by Minimizing Sum of Errors Type I and Type II

Understanding p-values and significance