Abstract:Recent likelihood theory produces $p$-values that have remarkable accuracy and wide applicability. The calculations use familiar tools such as maximum likelihood values (MLEs), observed information and parameter rescaling. The usual evaluation of such $p$-values is by simulations, and such simulations do verify that the global distribution of the $p$-values is uniform(0, 1), to high accuracy in repeated sampling. The derivation of the $p$-values, however, asserts a stronger statement, that they have a uniform(0, 1) distribution conditionally, given identified precision information provided by the data. We take a simple regression example that involves exact precision information and use large sample techniques to extract highly accurate information as to the statistical position of the data point with respect to the parameter: specifically, we examine various $p$-values and Bayesian posterior survivor $s$-values for validity. With observed data we numerically evaluate the various $p$-values and $s$-values, and we also record the related general formulas. We then assess the numerical values for accuracy using Markov chain Monte Carlo (McMC) methods. We also propose some third-order likelihood-based procedures for obtaining means and variances of Bayesian posterior distributions, again followed by McMC assessment. Finally we propose some adaptive McMC methods to improve the simulation acceptance rates. All these methods are based on asymptotic analysis that derives from the effect of additional data. And the methods use simple calculations based on familiar maximizing values and related informations. The example illustrates the general formulas and the ease of calculations, while the McMC assessments demonstrate the numerical validity of the $p$-values as percentage position of a data point. The example, however, is very simple and transparent, and thus gives little indication that in a wide generality of models the formulas do accurately separate information for almost any parameter of interest, and then do give accurate $p$-value determinations from that information. As illustration an enigmatic problem in the literature is discussed and simulations are recorded; various examples in the literature are cited.

Too Big to Fail: Larger Samples and False Discoveries

Research Commentary - Too Big to Fail: Large Samples and the p-Value Problem.

Using large samples in econometrics

Testing Theories with Big Data : A SuperPower Approach

Statistical Inference with Large (ecommerce) Datasets

Minor Issues Escalated to Critical Levels in Large Samples: A Permutation-Based Fix

Inference at Scale Significance Testing for Large Search and Recommendation Experiments

Linear Probability Models in Information Systems Research.

"Medium-n studies" in computing education conferences

Linear Probability Models (LPM) and Big Data:The Good, The Bad, and The Ugly

Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise

An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?

A simple statistical framework for small sample studies

Normality and significance testing in simple linear regression model for large sample sizes: a simulation study

Optimal Subsampling Approaches for Large Sample Linear Regression

Higher Accuracy for Bayesian and Frequentist Inference: Large Sample Theory for Small Sample Likelihood

Impact of Sample Size and Variability on the Power and Type I Error Rates of Equivalence Tests: A Simulation Study.

Power Analysis, Sample Size, and Assessment of Statistical Assumptions—Improving the Evidential Value of Lighting Research

Financial data science: the birth of a new financial research paradigm complementing econometrics?

All about sample-size calculations for A/B testing: Novel extensions and practical guide

A simple model suggesting economically rational sample-size choice drives irreproducibility