Abstract:The quintessential learning algorithm of empirical risk minimization (ERM) is known to fail in various settings for which uniform convergence does not characterize learning. It is therefore unsurprising that the practice of machine learning is rife with considerably richer algorithmic techniques for successfully controlling model capacity. Nevertheless, no such technique or principle has broken away from the pack to characterize optimal learning in these more general settings. The purpose of this work is to characterize the role of regularization in perhaps the simplest setting for which ERM fails: multiclass learning with arbitrary label sets. Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles: Occam's Razor as embodied by structural risk minimization (SRM), the principle of maximum entropy, and Bayesian reasoning. Most notably, we introduce an optimal learner which relaxes structural risk minimization on two dimensions: it allows the regularization function to be "local" to datapoints, and uses an unsupervised learning stage to learn this regularizer at the outset. We justify these relaxations by showing that they are necessary: removing either dimension fails to yield a near-optimal learner. We also extract from OIGs a combinatorial sequence we term the Hall complexity, which is the first to characterize a problem's transductive error rate exactly. Lastly, we introduce a generalization of OIGs and the transductive learning setting to the agnostic case, where we show that optimal orientations of Hamming graphs -- judged using nodes' outdegrees minus a system of node-dependent credits -- characterize optimal learners exactly. We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.

Guaranteed classification via regularized similarity learning.

Generalization Bounds for Metric and Similarity Learning

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

A Regularized Approach for Geodesic-Based Semisupervised Multimanifold Learning

Relative Regularity Conditions and Linear Regularity Properties for Split Feasibility Problems in Normed Linear Spaces

Learning Similarity Metric with SVM.

Unified Regularity Measures for Sample-wise Learning and Generalization

Generalization Analysis of Fredholm Kernel Regularized Classifiers

A Duality Approach to Regularized Learning Problems in Banach Spaces

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints.

GENERALIZATION BOUNDS OF REGULARIZATION ALGORITHMS DERIVED SIMULTANEOUSLY THROUGH HYPOTHESIS SPACE COMPLEXITY, ALGORITHMIC STABILITY AND DATA QUALITY

Confusion Matrix Stability Bounds for Multiclass Classification

Generalization errors of Laplacian regularized least squares regression

Boosting Certified Robustness Via an Expectation-Based Similarity Regularization

Unified Locally Linear Classifiers with Diversity-Promoting Anchor Points

Analysis of Regularized Learning for Linear-functional Data in Banach Spaces

Learning Fair Classifiers via Min-Max F-divergence Regularization

Discriminative Similarity for Clustering and Semi-Supervised Learning

Regularization and Optimal Multiclass Learning

Simultaneous Learning Of Affinity Matrix And Laplacian Regularized Least Squares For Semi-Supervised Classification

Robust Regularized Low-Rank Matrix Models for Regression and Classification