Testing for Overfitting

James Schmidt

2023-05-10

Abstract:High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting computes empirical risk on a holdout set and halts once (or flags that/when) it begins to increase. Such practice often helps in outputting a well-generalizing model, but justification for why it works is primarily heuristic. We discuss the overfitting problem and explain why standard asymptotic and concentration results do not hold for evaluation with training data. We then proceed to introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data, and overfitting quantitatively defined and detected. We rely on said concentration bounds which guarantee that empirical means should, with high probability, approximate their true mean to conclude that they should approximate each other. We stipulate conditions under which this test is valid, describe how the test may be used for identifying overfitting, articulate a further nuance according to which distributional shift may be flagged, and highlight an alternative notion of learning which usefully captures generalization in the absence of uniform PAC guarantees.

Machine Learning

What problem does this paper attempt to address?

The paper primarily aims to address the issue of overfitting in machine learning. Specifically: 1. **Explain the overfitting problem**: The paper first discusses in detail the phenomenon of overfitting, where a model performs well on training data but has poor generalization ability on new data. 2. **Limitations of standard methods**: The traditional approach is to retain a validation set during training to detect overfitting. However, this method lacks rigorous statistical theoretical support and treats the training set and validation set as completely different entities, which may lead to misunderstandings in the use of training data. 3. **Propose a new statistical test**: To overcome the above limitations, the authors propose a new statistical test method that can evaluate model performance on training data and quantitatively define and detect overfitting. This method is based on a modified Law of Large Numbers, comparing the performance of the training set and the validation set to determine whether the model is overfitting. 4. **Address the issue of distribution drift**: Additionally, this test can detect changes in data distribution, further distinguishing overfitting from other potential issues. 5. **Weaken the concept of uniform bounds**: The paper also introduces a weaker but still rich concept of learning that can effectively capture generalization ability even in the absence of uniform Probably Approximately Correct (PAC) guarantees. In summary, this paper aims to provide a more rigorous method to detect and quantify the phenomenon of overfitting, while also offering new insights and tools for addressing generalization issues in machine learning.

Testing for Overfitting

Detecting Overfitting via Adversarial Examples

Overfitting In Contrastive Learning?

Machine Learning Students Overfit to Overfitting

An Overview of Overfitting and its Solutions

Quantifying Overfitting: Evaluating Neural Network Performance through Analysis of Null Space

Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data

Overfitting in adversarially robust deep learning

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Goodness-of-fit Testing in Linear Regression Models

The Implicit Bias of Benign Overfitting

When a good fit can be bad

Is a Classification Procedure Good Enough? A Goodness-of-Fit Assessment Tool for Classification Learning

Goodness-of-fit testing in high-dimensional generalized linear models

Backtest overfitting in the machine learning era: A comparison of out-of-sample testing methods in a synthetic controlled environment

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting

The Limits of Assumption-free Tests for Algorithm Performance

Benign overfitting without concentration

GRASP: A Goodness-of-Fit Test for Classification Learning