Abstract:We investigate the in-distribution generalization of machine learning algorithms. We depart from traditional complexity-based approaches by analyzing information-theoretic bounds that quantify the dependence between a learning algorithm and the training data. We consider two categories of generalization guarantees: 1) Guarantees in expectation: These bounds measure performance in the average case. Here, the dependence between the algorithm and the data is often captured by information measures. While these measures offer an intuitive interpretation, they overlook the geometry of the algorithm's hypothesis class. Here, we introduce bounds using the Wasserstein distance to incorporate geometry, and a structured, systematic method to derive bounds capturing the dependence between the algorithm and an individual datum, and between the algorithm and subsets of the training data. 2) PAC-Bayesian guarantees: These bounds measure the performance level with high probability. Here, the dependence between the algorithm and the data is often measured by the relative entropy. We establish connections between the Seeger--Langford and Catoni's bounds, revealing that the former is optimized by the Gibbs posterior. We introduce novel, tighter bounds for various types of loss functions. To achieve this, we introduce a new technique to optimize parameters in probabilistic statements. To study the limitations of these approaches, we present a counter-example where most of the information-theoretic bounds fail while traditional approaches do not. Finally, we explore the relationship between privacy and generalization. We show that algorithms with a bounded maximal leakage generalize. For discrete data, we derive new bounds for differentially private algorithms that guarantee generalization even with a constant privacy parameter, which is in contrast to previous bounds in the literature.

On Lower Bounds for Statistical Learning Theory

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

Information Theoretic Lower Bounds on Negative Log Likelihood

Lower Bounds on the Oracle Complexity of Nonsmooth Convex Optimization via Information Theory

Information-Theoretic Foundations for Machine Learning

Lower Bounds for Learning Distributions under Communication Constraints via Fisher Information

Information Lower Bounds for Robust Mean Estimation

Low coordinate degree algorithms I: Universality of computational thresholds for hypothesis testing

Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities

On the Tightness of Information-Theoretic Bounds on Generalization Error of Learning Algorithms.

An Information-Theoretic Analysis for Transfer Learning: Error Bounds and Applications

An Information-Theoretic Approach to Generalization Theory

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation

A unified framework for information-theoretic generalization bounds

Information-theoretic generalization bounds for black-box learning algorithms

An information-theoretic lower bound in time-uniform estimation

High-probability minimax lower bounds

Information Theory and its Relation to Machine Learning

On Variational Bounds of Mutual Information

$L_q$ Lower Bounds on Distributed Estimation via Fisher Information