Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise

Zad Rafi,Sander Greenland
DOI: https://doi.org/10.1186/s12874-020-01105-9
2020-10-01
Abstract:Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and $P$-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review simple aids to statistical interpretations. These aids emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. We use the Shannon transform of the $P$-value $p$, also known as the binary surprisal or $S$-value $s=-\log_{2}(p)$, to measure the information supplied by the testing procedure, and to help calibrate intuitions against simple physical experiments like coin tossing. We also use tables or graphs of test statistics for alternative hypotheses, and interval estimates for different percentile levels, to thwart fallacies arising from arbitrary dichotomies. Finally, we reinterpret $P$-values and interval estimates in unconditional terms, which describe compatibility of data with the entire set of analysis assumptions. We illustrate these methods with a reanalysis of data from an existing record-based cohort study. In line with other recent recommendations, we advise that teaching materials and research reports discuss $P$-values as measures of compatibility rather than significance, compute $P$-values for alternative hypotheses whenever they are computed for null hypotheses, and interpret interval estimates as showing values of high compatibility with data, rather than regions of confidence. Our recommendations emphasize cognitive devices for displaying the compatibility of the observed data with various hypotheses of interest, rather than focusing on single hypothesis tests or interval estimates. We believe these simple reforms are well worth the minor effort they require.
Methodology,Quantitative Methods,Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the misinterpretation and misuse of traditional statistical outputs (such as p - values and confidence intervals) in statistics. Specifically, the author believes that these problems are not just technical difficulties, but more cognitive challenges. Therefore, the paper proposes several simple methods to help researchers interpret statistical results more accurately. These methods emphasize logic and information concepts rather than probability, aiming to reduce common misunderstandings. ### Background and Core Issues of the Paper 1. **Background**: - Researchers often misread and misuse statistical outputs, leading to a large number of literatures discussing how to modify or replace traditional hypothesis - testing thresholds and p - values. - Since the core problem is more cognitive than statistical, traditional solutions are often of limited effectiveness. 2. **Core Issues**: - Traditional terms such as "significance", "non - significance" and "confidence interval" are prone to cause misunderstandings. - Researchers tend to take extra leaps and shortcuts, so it is necessary to predict the meaning of terms and interpretations to improve practice. - Although some journals strongly opposed reporting p - values many years ago, misunderstandings about statistical significance still exist. ### Solutions 1. **Using Compatibility and Surprise Values**: - The author suggests using the Shannon - transformed p - value, that is, the binary surprise value (s = - log2(p)), to provide the amount of information provided by the test procedure and help calibrate intuition. - Display the test statistics under different hypotheses and the interval estimates at different percentile levels through tables or graphs to avoid fallacies caused by arbitrary dichotomies. 2. **Reinterpreting p - Values and Interval Estimates**: - Unconditionally describe p - values and interval estimates as the compatibility of data with the entire set of analysis hypotheses, rather than only focusing on a single hypothesis test or interval estimate. 3. **Suggestions for Improving Teaching Materials and Research Reports**: - Teaching materials and research reports should describe p - values as a measure of compatibility, not just significance. - Calculate the p - values of alternative hypotheses, not just the null hypothesis. - Interpret interval estimates as values highly compatible with data, rather than confidence intervals. ### Example Applications The paper demonstrates the application of these methods through an example of a record - based cohort study. Specifically, the author analyzes the association between the use of selective serotonin reuptake inhibitors (SSRIs) during pregnancy and autism spectrum disorder (ASD) in offspring. By calculating p - values and surprise values under different hypotheses, the author shows how to interpret these results more accurately and avoid common misunderstandings. ### Conclusion The author believes that although these simple reforms require some effort, they are very worthwhile. Through these methods, the misreading of traditional statistical outputs can be reduced, and the accuracy and reliability of scientific research can be improved.