Abstract:The accuracy of machine learning systems is a widely studied research topic. Established techniques such as cross-validation predict the accuracy on unseen data of the classifier produced by applying a given learning method to a given training data set. However, they do not predict whether incurring the cost of obtaining more data and undergoing further training will lead to higher accuracy. In this paper we investigate techniques for making such early predictions. We note that when a machine learning algorithm is presented with a training set the classifier produced, and hence its error, will depend on the characteristics of the algorithm, on training set's size, and also on its specific composition. In particular we hypothesise that if a number of classifiers are produced, and their observed error is decomposed into bias and variance terms, then although these components may behave differently, their behaviour may be predictable. We test our hypothesis by building models that, given a measurement taken from the classifier created from a limited number of samples, predict the values that would be measured from the classifier produced when the full data set is presented. We create separate models for bias, variance and total error. Our models are built from the results of applying ten different machine learning algorithms to a range of data sets, and tested with "unseen" algorithms and datasets. We analyse the results for various numbers of initial training samples, and total dataset sizes. Results show that our predictions are very highly correlated with the values observed after undertaking the extra training. Finally we consider the more complex case where an ensemble of heterogeneous classifiers is trained, and show how we can accurately estimate an upper bound on the accuracy achievable after further training.

When a Classifier Meets More Data

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression

Modeling Generalization in Machine Learning: A Methodological and Computational Study

Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study

Generalization Error Bounds for Learning under Censored Feedback

Extrapolating Expected Accuracies for Large Multi-Class Problems

Generalization bounds for regression and classification on adaptive covering input domains

Making Early Predictions of the Accuracy of Machine Learning Applications

Classification with many classes: challenges and pluses

Class-wise Generalization Error: an Information-Theoretic Analysis

Understanding deep learning requires rethinking generalization

GENERALIZATION BOUNDS OF REGULARIZATION ALGORITHMS DERIVED SIMULTANEOUSLY THROUGH HYPOTHESIS SPACE COMPLEXITY, ALGORITHMIC STABILITY AND DATA QUALITY

Rethinking generalization of classifiers in separable classes scenarios and over-parameterized regimes

Data organization limits the predictability of binary classification

Studying Generalization Through Data Averaging

Understanding deep learning (still) requires rethinking generalization

Generalization error for decision problems

Classifier Simulation Algorithm and Its Applications

The Generalization Error of Machine Learning Algorithms

Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation