The Classification Permutation Test: A Nonparametric Test for Equality of Multivariate Distributions

Johann Gagnon-Bartsch,Yotam Shem-Tov
DOI: https://doi.org/10.48550/arXiv.1611.06408
2016-11-19
Applications
Abstract:The gold standard for identifying causal relationships is a randomized controlled experiment. In many applications in the social sciences and medicine, the researcher does not control the assignment mechanism and instead may rely upon natural experiments, regression discontinuity designs, RCTs with attrition, or matching methods as a substitute to experimental randomization. The standard testable implication of random assignment is covariate balance between the treated and control units. Covariate balance is therefore commonly used to validate the claim of "as-if" random assignment. We develop a new nonparametric test of covariate balance. Our Classification Permutation Test (CPT) is based on a combination of classification methods (e.g. logistic regression or random forests) with Fisherian permutation inference. The CPT is guaranteed to have correct coverage and is consistent under weak assumptions on the chosen classifier. To illustrate the gains of using the CPT, we revisit four real data examples: Lyall (2009); Green and Winik (2010); Eggers and Hainmueller (2009); and Rouse (1995). Monte Carlo power simulations are used to compare the CPT to two existing nonparametric tests of equality of multivariate distributions.
What problem does this paper attempt to address?