Testing for Overfitting

James Schmidt
2023-05-10
Abstract:High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting computes empirical risk on a holdout set and halts once (or flags that/when) it begins to increase. Such practice often helps in outputting a well-generalizing model, but justification for why it works is primarily heuristic. We discuss the overfitting problem and explain why standard asymptotic and concentration results do not hold for evaluation with training data. We then proceed to introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data, and overfitting quantitatively defined and detected. We rely on said concentration bounds which guarantee that empirical means should, with high probability, approximate their true mean to conclude that they should approximate each other. We stipulate conditions under which this test is valid, describe how the test may be used for identifying overfitting, articulate a further nuance according to which distributional shift may be flagged, and highlight an alternative notion of learning which usefully captures generalization in the absence of uniform PAC guarantees.
Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address the issue of overfitting in machine learning. Specifically: 1. **Explain the overfitting problem**: The paper first discusses in detail the phenomenon of overfitting, where a model performs well on training data but has poor generalization ability on new data. 2. **Limitations of standard methods**: The traditional approach is to retain a validation set during training to detect overfitting. However, this method lacks rigorous statistical theoretical support and treats the training set and validation set as completely different entities, which may lead to misunderstandings in the use of training data. 3. **Propose a new statistical test**: To overcome the above limitations, the authors propose a new statistical test method that can evaluate model performance on training data and quantitatively define and detect overfitting. This method is based on a modified Law of Large Numbers, comparing the performance of the training set and the validation set to determine whether the model is overfitting. 4. **Address the issue of distribution drift**: Additionally, this test can detect changes in data distribution, further distinguishing overfitting from other potential issues. 5. **Weaken the concept of uniform bounds**: The paper also introduces a weaker but still rich concept of learning that can effectively capture generalization ability even in the absence of uniform Probably Approximately Correct (PAC) guarantees. In summary, this paper aims to provide a more rigorous method to detect and quantify the phenomenon of overfitting, while also offering new insights and tools for addressing generalization issues in machine learning.