Abstract:We introduce a notion of distance between supervised learning problems, which we call the Risk distance. This optimal-transport-inspired distance facilitates stability results; one can quantify how seriously issues like sampling bias, noise, limited data, and approximations might change a given problem by bounding how much these modifications can move the problem under the Risk distance. With the distance established, we explore the geometry of the resulting space of supervised learning problems, providing explicit geodesics and proving that the set of classification problems is dense in a larger class of problems. We also provide two variants of the Risk distance: one that incorporates specified weights on a problem's predictors, and one that is more sensitive to the contours of a problem's risk landscape.

What problem does this paper attempt to address?

The paper discusses the geometric stability and distance metrics of supervised learning problems. The authors introduce a new concept called "Risk distance," which is a distance inspired by optimal transport theory and is used to compare the similarity of two supervised learning problems. This distance metric allows the quantification of the impact of noise, bias, limited data, and approximation methods on the problem during the data collection process. In the paper, a supervised learning problem is defined as a quintuple, including the input space, response space, joint probability distribution, loss function, and predictor set. The study investigates how the variation of individual or multiple components through Risk distance measures affects the overall characteristics of the problem, thereby addressing two main questions: 1. To what extent does a compromise (such as noise or data skew) change the problem and its descriptive features? 2. What effect does the combination of multiple compromises have? Can a series of small changes be guaranteed not to have a significant impact on the problem to be solved? Risk distance is constructed based on optimal transport theory and is similar to Gromov-Wasserstein distance, providing a geometric interpretation of the distance between problems. The paper also discusses the impact of modifications to the loss function, predictor set, input and response spaces on the stability of learning problems, and proves the stability of certain descriptors (such as restricted Bayesian risk) under Risk distance. In addition, the author studies the geometric properties of the problem space under Risk distance, such as optimal coupling and correlation, as well as the density of classification problems in this space. Finally, the paper proposes two variants of Risk distance, one considering predictor weights and the other being more sensitive to risk landscapes. Overall, this paper provides a comprehensive framework to quantify and understand how various compromises encountered in practical machine learning affect the stability and geometric structure of supervised learning problems.

Geometry and Stability of Supervised Learning Problems

Geometry, Computation, and Optimality in Stochastic Optimization

Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications

A Regularized Approach for Geodesic-Based Semisupervised Multimanifold Learning

Distributionally Robust Optimization with Data Geometry

Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Leveraging Optimal Transport via Projections on Subspaces for Machine Learning Applications

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

The Geometric Effects of Distributing Constrained Nonconvex Optimization Problems

An Empirical Study of Self-Supervised Learning with Wasserstein Distance

Geometry of vectorial martingale optimal transport and robust option pricing

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

A geometrical viewpoint on the benign overfitting property of the minimum -norm interpolant estimator and its universality

Stability and Optimization Error of Stochastic Gradient Descent for Pairwise Learning

The Riemannian geometry of Sinkhorn divergences

The Geometry and Calculus of Losses

Inducing Semantic Hierarchy Structure in Empirical Risk Minimization with Optimal Transport Measures

Semi-supervised Learning based on Distributionally Robust Optimization

On the Stability of a non-hyperbolic nonlinear map with non-bounded set of non-isolated fixed points with applications to Machine Learning

On the Concentration of the Minimizers of Empirical Risks