Abstract:We investigate the in-distribution generalization of machine learning algorithms. We depart from traditional complexity-based approaches by analyzing information-theoretic bounds that quantify the dependence between a learning algorithm and the training data. We consider two categories of generalization guarantees: 1) Guarantees in expectation: These bounds measure performance in the average case. Here, the dependence between the algorithm and the data is often captured by information measures. While these measures offer an intuitive interpretation, they overlook the geometry of the algorithm's hypothesis class. Here, we introduce bounds using the Wasserstein distance to incorporate geometry, and a structured, systematic method to derive bounds capturing the dependence between the algorithm and an individual datum, and between the algorithm and subsets of the training data. 2) PAC-Bayesian guarantees: These bounds measure the performance level with high probability. Here, the dependence between the algorithm and the data is often measured by the relative entropy. We establish connections between the Seeger--Langford and Catoni's bounds, revealing that the former is optimized by the Gibbs posterior. We introduce novel, tighter bounds for various types of loss functions. To achieve this, we introduce a new technique to optimize parameters in probabilistic statements. To study the limitations of these approaches, we present a counter-example where most of the information-theoretic bounds fail while traditional approaches do not. Finally, we explore the relationship between privacy and generalization. We show that algorithms with a bounded maximal leakage generalize. For discrete data, we derive new bounds for differentially private algorithms that guarantee generalization even with a constant privacy parameter, which is in contrast to previous bounds in the literature.

Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

Generalization Bounds for Stochastic Gradient Langevin Dynamics: A Unified View Via Information Leakage Analysis

Conditional Mutual Information-Based Generalization Bound for Meta Learning

Provable Generalization of Overparameterized Meta-learning Trained with SGD

Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning

Towards Understanding Generalization in Gradient-Based Meta-Learning

Theoretical Analysis of Meta Reinforcement Learning: Generalization Bounds and Convergence Guarantees

Towards Generalization Beyond Pointwise Learning: A Unified Information-theoretic Perspective

Transfer Meta-Learning: Information-Theoretic Bounds and Information Meta-Risk Minimization

On the Tightness of Information-Theoretic Bounds on Generalization Error of Learning Algorithms.

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Information-theoretic generalization bounds for black-box learning algorithms

Generalization Error Bounds for Iterative Learning Algorithms with Bounded Updates

Fine-grained Generalization Analysis of Vector-valued Learning

Generalization Analysis for Game-Theoretic Machine Learning

Generalization Bounds for Metric and Similarity Learning

A unified framework for information-theoretic generalization bounds

Generalization Bounds via Conditional $f$-Information

On the Generalization Error of Meta Learning for the Gibbs Algorithm

An Information-Theoretic Approach to Generalization Theory

Bounds on the Generalization Error in Active Learning