Abstract:As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance and frequently outperforms conventional hypothesis testing. This improvement uses measures directly obtained from machine learning classifications, such as accuracy, that do not have a parametric description. To approach a frequentist analysis within machine learning pipelines, a permutation test or simple statistics from data partitions (i.e. folds) can be added to estimate confidence intervals. Unfortunately, neither parametric nor non-parametric tests solve the inherent problems around partitioning small sample-size datasets and learning from heterogeneous data sources. The fact that machine learning strongly depends on the learning parameters and the distribution of data across folds recapitulates familiar difficulties around excess false positives and replication. The origins of this problem are demonstrated by simulating common experimental circumstances, including small sample sizes, low numbers of predictors, and heterogeneous data sources. A novel statistical test based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is composed, where uncertain predictions of machine learning with CV are bounded by the \emph{worst case} through the evaluation of concentration inequalities. Probably Approximately Correct-Bayesian upper bounds for linear classifiers in combination with K-fold CV is used to estimate the empirical error. The performance with neuroimaging datasets suggests this is a robust criterion for detecting effects, validating accuracy values obtained from machine learning whilst avoiding excess false positives.

General Approximate Cross Validation for Model Selection

Approximate Cross-validation: Guarantees for Model Assessment and Selection

Iterative Approximate Cross-Validation

Fast Cross-Validation for Kernel-Based Algorithms.

Efficient Cross-Validation for Semi-Supervised Learning

Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Cross-validation: what does it estimate and how well does it do it?

Bootstrapping the Cross-Validation Estimate

Double Cross Validation for the Number of Factors in Approximate Factor Models

Efficient Approximation of Cross-Validation for Kernel Methods Using Bouligand Influence Function

Fast and Informative Model Selection using Learning Curve Cross-Validation

On The Smoothness of Cross-Validation-Based Estimators Of Classifier Performance

Optimizing for Generalization in Machine Learning with Cross-Validation Gradients

Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery

Least Squares Model Averaging Based on Generalized Cross Validation

Approximate Cross-validated Mean Estimates for Bayesian Hierarchical Regression Models

Bootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation

Model Selection Via Multifold Cross Validation

Is K-fold cross validation the best model selection method for Machine Learning?

Granularity Selection for Cross-Validation of SVM

Kernel‐based Generalized Cross‐validation in Non‐parametric Mixed‐effect Models