Abstract:Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The popular and widely used permutation VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed in comparison to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation is provided through the accompanying R package $rfvimptest$. The approach can also be easily applied to any kind of prediction model.

Sequential Permutation Testing of Random Forest Variable Importance Measures

Scalable and Efficient Hypothesis Testing with Random Forests

PERMUTOOLS: A MATLAB Package for Multivariate Permutation Testing

Permutation Tests for Regression, ANOVA, and Comparison of Signals: The permuco Package

Generalized Permutation Framework for Testing Model Variable Significance

Fast Approximation of Small P-values in Permutation Tests by Partitioning the Permutations

A randomized permutation whole-model test heuristic for Self-Validated Ensemble Models (SVEM)

Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival

A studentized permutation test in group sequential designs

Multivariate quantile-based permutation tests with application to functional data

Least Squares-Based Permutation Tests in Time Series

Permutation-based multiple testing when fitting many generalized linear models

Efficiently estimating small p-values in permutation tests using importance sampling and cross-entropy method

Another look at the Lady Tasting Tea and differences between permutation tests and randomization tests

The Classification Permutation Test: A Nonparametric Test for Equality of Multivariate Distributions

Extremely efficient permutation and bootstrap hypothesis tests using R

Permute-match tests: Detecting significant correlations between time series despite nonstationarity and limited replicates

Permutation Tests at Nonparametric Rates

Residual Permutation Test for High-Dimensional Regression Coefficient Testing

Functional Response Designs via the Analytic Permutation Test

The morphogenesis and fate of the nucleolar channel system in the human endometrial glandular cell.