Abstract:The quintessential learning algorithm of empirical risk minimization (ERM) is known to fail in various settings for which uniform convergence does not characterize learning. It is therefore unsurprising that the practice of machine learning is rife with considerably richer algorithmic techniques for successfully controlling model capacity. Nevertheless, no such technique or principle has broken away from the pack to characterize optimal learning in these more general settings. The purpose of this work is to characterize the role of regularization in perhaps the simplest setting for which ERM fails: multiclass learning with arbitrary label sets. Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles: Occam's Razor as embodied by structural risk minimization (SRM), the principle of maximum entropy, and Bayesian reasoning. Most notably, we introduce an optimal learner which relaxes structural risk minimization on two dimensions: it allows the regularization function to be "local" to datapoints, and uses an unsupervised learning stage to learn this regularizer at the outset. We justify these relaxations by showing that they are necessary: removing either dimension fails to yield a near-optimal learner. We also extract from OIGs a combinatorial sequence we term the Hall complexity, which is the first to characterize a problem's transductive error rate exactly. Lastly, we introduce a generalization of OIGs and the transductive learning setting to the agnostic case, where we show that optimal orientations of Hamming graphs -- judged using nodes' outdegrees minus a system of node-dependent credits -- characterize optimal learners exactly. We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.

G-Optimal Design with Laplacian Regularization.

Laplacian Regularized D-optimal Design for active learning and its application to image retrieval

G-Optimal Feature Selection with Laplacian regularization

Manifold Optimal Experimental Design Via Dependence Maximization for Active Learning

Laplacian optimal design for image retrieval.

Active Learning Based on Locally Linear Reconstruction

Spatial Batch Optimal Design Based on Self-Learning Gaussian Process Models for LPCVD Processes

Locally regressive G-optimal design for image retrieval

Optimal design for linear models via gradient flow

Neighborhood Preserving D-Optimal Design for Active Learning and Its Application to Terrain Classification

Active Learning for Social Image Retrieval Using Locally Regressive Optimal Design

Regularization and Optimal Multiclass Learning

A-Optimal Projection for Image Representation.

Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision

New Balanced Active Learning Model and Optimization Algorithm.

GORFLM: Globally Optimal Robust Fitting for Linear Model

Convex Experimental Design Using Manifold Structure for Image Retrieval.

Anomaly Detection via Graphical Lasso

OAL: Enhancing OOD Detection Using Latent Diffusion

Optimal Regularization for a Data Source

Active learning on manifolds