Geometry and Stability of Supervised Learning Problems

Facundo Mémoli,Brantley Vose,Robert C. Williamson
2024-03-04
Abstract:We introduce a notion of distance between supervised learning problems, which we call the Risk distance. This optimal-transport-inspired distance facilitates stability results; one can quantify how seriously issues like sampling bias, noise, limited data, and approximations might change a given problem by bounding how much these modifications can move the problem under the Risk distance. With the distance established, we explore the geometry of the resulting space of supervised learning problems, providing explicit geodesics and proving that the set of classification problems is dense in a larger class of problems. We also provide two variants of the Risk distance: one that incorporates specified weights on a problem's predictors, and one that is more sensitive to the contours of a problem's risk landscape.
Machine Learning,Metric Geometry
What problem does this paper attempt to address?
The paper discusses the geometric stability and distance metrics of supervised learning problems. The authors introduce a new concept called "Risk distance," which is a distance inspired by optimal transport theory and is used to compare the similarity of two supervised learning problems. This distance metric allows the quantification of the impact of noise, bias, limited data, and approximation methods on the problem during the data collection process. In the paper, a supervised learning problem is defined as a quintuple, including the input space, response space, joint probability distribution, loss function, and predictor set. The study investigates how the variation of individual or multiple components through Risk distance measures affects the overall characteristics of the problem, thereby addressing two main questions: 1. To what extent does a compromise (such as noise or data skew) change the problem and its descriptive features? 2. What effect does the combination of multiple compromises have? Can a series of small changes be guaranteed not to have a significant impact on the problem to be solved? Risk distance is constructed based on optimal transport theory and is similar to Gromov-Wasserstein distance, providing a geometric interpretation of the distance between problems. The paper also discusses the impact of modifications to the loss function, predictor set, input and response spaces on the stability of learning problems, and proves the stability of certain descriptors (such as restricted Bayesian risk) under Risk distance. In addition, the author studies the geometric properties of the problem space under Risk distance, such as optimal coupling and correlation, as well as the density of classification problems in this space. Finally, the paper proposes two variants of Risk distance, one considering predictor weights and the other being more sensitive to risk landscapes. Overall, this paper provides a comprehensive framework to quantify and understand how various compromises encountered in practical machine learning affect the stability and geometric structure of supervised learning problems.