Abstract:We investigate the in-distribution generalization of machine learning algorithms. We depart from traditional complexity-based approaches by analyzing information-theoretic bounds that quantify the dependence between a learning algorithm and the training data. We consider two categories of generalization guarantees: 1) Guarantees in expectation: These bounds measure performance in the average case. Here, the dependence between the algorithm and the data is often captured by information measures. While these measures offer an intuitive interpretation, they overlook the geometry of the algorithm's hypothesis class. Here, we introduce bounds using the Wasserstein distance to incorporate geometry, and a structured, systematic method to derive bounds capturing the dependence between the algorithm and an individual datum, and between the algorithm and subsets of the training data. 2) PAC-Bayesian guarantees: These bounds measure the performance level with high probability. Here, the dependence between the algorithm and the data is often measured by the relative entropy. We establish connections between the Seeger--Langford and Catoni's bounds, revealing that the former is optimized by the Gibbs posterior. We introduce novel, tighter bounds for various types of loss functions. To achieve this, we introduce a new technique to optimize parameters in probabilistic statements. To study the limitations of these approaches, we present a counter-example where most of the information-theoretic bounds fail while traditional approaches do not. Finally, we explore the relationship between privacy and generalization. We show that algorithms with a bounded maximal leakage generalize. For discrete data, we derive new bounds for differentially private algorithms that guarantee generalization even with a constant privacy parameter, which is in contrast to previous bounds in the literature.

Information-Theoretic Generalization Bounds for Batch Reinforcement Learning

Generalization Bounds for Stochastic Gradient Langevin Dynamics: A Unified View Via Information Leakage Analysis

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

Information-Theoretic Confidence Bounds for Reinforcement Learning

A unified framework for information-theoretic generalization bounds

An Information-Theoretic Approach to Generalization Theory

Information-theoretic generalization bounds for black-box learning algorithms

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

Understanding What Affects the Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Renyi's Entropy Perspective

Generalization Bounds via Conditional $f$-Information

Learning Representations in Reinforcement Learning:An Information Bottleneck Approach

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

On the Tightness of Information-Theoretic Bounds on Generalization Error of Learning Algorithms.

An Information Theoretic Approach to Interaction-Grounded Learning

Generalization Analysis for Game-Theoretic Machine Learning

Theoretical Analysis of Meta Reinforcement Learning: Generalization Bounds and Convergence Guarantees

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Fine-grained Generalization Analysis of Vector-valued Learning