Abstract:In machine learning, when the number of available samples is very limited, it constitutes a small-sample problem. Small-sample problem is extremely challenging in learning a good model. As the first study on building the most reliable and generalizable model and providing the most reliable measure of a model's generalization performance, a large number of random sampling sets without or with noise added are created from observation dataset to simulate unseen training sets or unseen test sets respectively, where the former makes it available to consider constituting a model of the most generalization ability, and the latter makes it available to measure generalization performance of a model in a most reliable manner. In modeling process, each training set is used to train a model, and the models are then combined to form global/final model. In evaluation process, a performance metric (such as mean square error) is calculated on each test set, and the model's generalization ability is the most reliably represented by the average value of these metrics. The method is especially suitable to small or even ultra-small sample problems due to its no need of data splitting in cross-validation approaches. Comparative experiments on different datasets show the effectiveness of the proposed method in the case of ultra-small-sample problems: for simulated data, the generalization error is approximately 80% lower than those of the conventional method, and for a typical ultra-small-sample problem, the real data of glomerular filtration rate prediction, it is 46.36%∼83.02% lower than those of many popular models.

Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study

A Monte Carlo Study of Confidence Interval Methods for Generalizability Coefficient.

Confidence Interval Estimation of Predictive Performance in the Context of AutoML

On the Efficacy of Generalization Error Prediction Scoring Functions

Estimating individual treatment effect: generalization bounds and algorithms

Small Sample Inference for Generalization Error in Classification Using the CUD Bound

The Generalization Error of Machine Learning Algorithms

Confidence intervals for the random forest generalization error

An All-Batch Loss for Constructing Prediction Intervals

Confidence Intervals and Regions for Quantiles Using Conditional Monte Carlo and Generalized Likelihood Ratios.

A practical generalization metric for deep networks benchmarking

Testing Generalizability in Causal Inference

Learning a Model with the Most Generality for Small-Sample Problems.

Learning Prediction Intervals for Regression: Generalization and Calibration

Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

Estimating Confidence Intervals and Regions for Quantiles by Monte Carlo Simulation

Evaluating machine learning models in non-standard settings: An overview and new findings

Class-wise Generalization Error: an Information-Theoretic Analysis

Confidence Corridors for Multivariate Generalized Quantile Regression

Modeling Generalization in Machine Learning: A Methodological and Computational Study

On the near-optimality of betting confidence sets for bounded means