Abstract:We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger's PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds. Finally, we derive high-probability versions of some of these average bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using them to recover average and high-probability generalization bounds for multiclass classification with finite Natarajan dimension.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to introduce a new family of information - theoretic generalization bounds, which are evaluated by the discrete version of sample - conditional mutual information (CMI). Specifically, the author attempts to solve the following problems: 1. **Improve the tightness of existing generalization bounds**: The existing generalization bounds do not perform well on deep neural networks, especially becoming loose as the training time increases during the training process. The author hopes to provide tighter generalization bounds by introducing sample - conditional mutual information (e - CMI). 2. **Expand and unify existing theoretical results**: The author hopes to show that the proposed framework can re - derive and expand the previously known information - theoretic generalization bounds and can cover more types of loss metrics, such as binary KL - divergence. 3. **Handle multi - class classification problems**: The author attempts to use the e - CMI framework to provide average and high - probability generalization bounds for multi - class classification problems with a finite Natarajan dimension. 4. **Verify by numerical experiments**: Through experiments on the MNIST and CIFAR10 datasets, verify the effectiveness of the newly proposed generalization bounds in actual deep - learning scenarios, especially the superiority of the binary KL - bound compared to the existing square - root bound and linear bound. ### Main contributions - **A new family of generalization bounds**: Based on sample - conditional mutual information (e - CMI), several new generalization bounds are proposed, including the square - root bound, the linear bound, and the binary KL - bound. - **Application of sample - conditional mutual information**: It is shown how to use e - CMI to re - derive and expand the existing information - theoretic generalization bounds. - **High - probability bounds**: High - probability versions of generalization bounds are provided, which are suitable for multi - class classification problems. - **Numerical experiments**: Experiments prove that the newly proposed generalization bounds are tighter than the existing bounds in some cases. ### Mathematical formulas - **Conditional mutual information (CMI)**: \[ I(X; Y|Z) = D(P_{XY|Z}\|P_X|ZP_Y|Z) \] - **Sample - conditional mutual information (e - CMI)**: \[ I_z(X; Y) = D(P_{XY|Z = z}\|P_X|Z = zP_Y|Z = z) \] - **Square - root bound**: \[ |\mathbb{E}_{\tilde{Z}, S, R}[L_D(A, \tilde{Z}_S, R)]-\hat{L}|\leq\frac{1}{n}\sum_{i = 1}^n\sqrt{2I(\ell(A(\tilde{Z}_S, R), \tilde{Z}_i); S_i|\tilde{Z})} \] - **Binary KL - bound**: \[ d\left(\hat{L}\middle\|\frac{\hat{L}+L_D}{2}\right)\leq\frac{1}{n}\sum_{i = 1}^nI(\ell(A(\tilde{Z}_S, R), \tilde{Z}_i); S_i|\tilde{Z}) \] Through these formulas and methods, the author successfully provides a new tool to describe the performance of deep neural networks more accurately.

A New Family of Generalization Bounds Using Samplewise Evaluated CMI

Generalization Bounds via Conditional $f$-Information

Estimating individual treatment effect: generalization bounds and algorithms

A unified framework for information-theoretic generalization bounds

Slicing Mutual Information Generalization Bounds for Neural Networks

An Information-Theoretic Approach to Generalization Theory

Fast Rate Information-theoretic Bounds on Generalization Errors

Conditional Mutual Information-Based Generalization Bound for Meta Learning

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

On Variational Bounds of Mutual Information

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Class-wise Generalization Error: an Information-Theoretic Analysis

On the Tightness of Information-Theoretic Bounds on Generalization Error of Learning Algorithms.

Robust Generalization via $α$-Mutual Information

Generalization Bounds for Metric and Similarity Learning

Exactly Tight Information-Theoretic Generalization Error Bound for the Quadratic Gaussian Problem

Information-Theoretic Generalization Bounds for Deep Neural Networks

A General Framework for the Practical Disintegration of PAC-Bayesian Bounds

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

Generalization Error Bounds for Learning under Censored Feedback