Evaluating the median p-value method for assessing the statistical significance of tests when using multiple imputation

Peter C. Austin,Iris Eekhout,Stef van Buuren
DOI: https://doi.org/10.1080/02664763.2024.2418473
IF: 1.416
2024-10-27
Journal of Applied Statistics
Abstract:Rubin's Rules are commonly used to pool the results of statistical analyses across imputed samples when using multiple imputation. Rubin's Rules cannot be used when the result of an analysis in an imputed dataset is not a statistic and its associated standard error, but a test statistic (e.g. Student's t-test). While complex methods have been proposed for pooling test statistics across imputed samples, these methods have not been implemented in many popular statistical software packages. The median p -value method has been proposed for pooling test statistics. The statistical significance level of the pooled test statistic is the median of the associated p -values across the imputed samples. We evaluated the performance of this method with nine statistical tests: Student's t-test, Wilcoxon Rank Sum test, Analysis of Variance, Kruskal-Wallis test, the test of significance for Pearson's and Spearman's correlation coefficient, the Chi-squared test, the test of significance for a regression coefficient from a linear regression and from a logistic regression. For each test, the empirical type I error rate was higher than the advertised rate. The magnitude of inflation increased as the prevalence of missing data increased. The median p -value method should not be used to assess statistical significance across imputed datasets.
statistics & probability
What problem does this paper attempt to address?