Perturbation Validation: A New Heuristic to Validate Machine Learning Models

Jie M. Zhang,M. Harman,J. Shawe-Taylor,Earl T. Barr,Benjamin Guedj
2019-05-24
Abstract:This paper introduces Perturbation Validation (PV), a new heuristic to validate machine learning models. PV does not rely on test data. Instead, it perturbs training data labels, re-trains the model against the perturbed data, then uses the consequent training accuracy decrease rate to assess model fit. PV also differs from traditional statistical approaches, which make judgements without considering label distribution. We evaluate PV on 10 real-world datasets and 6 synthetic datasets. Our results demonstrate that PV is more discriminating about model fit than existing validation approaches and it accords well with widely-held intuitions concerning the properties of a good model fit measurement. We also show that PV complements existing validation approaches, allowing us to give explanations for some of the issues present in the recently-debated "apparent paradox" that high capacity (potentially "overfitted") models may, nevertheless, exhibit good generalisation ability.
Mathematics,Computer Science
What problem does this paper attempt to address?