Abstract:In this work, we introduce novel information-theoretic generalization bounds using the conditional $f$-information framework, an extension of the traditional conditional mutual information (MI) framework. We provide a generic approach to derive generalization bounds via $f$-information in the supersample setting, applicable to both bounded and unbounded loss functions. Unlike previous MI-based bounds, our proof strategy does not rely on upper bounding the cumulant-generating function (CGF) in the variational formula of MI. Instead, we set the CGF or its upper bound to zero by carefully selecting the measurable function invoked in the variational formula. Although some of our techniques are partially inspired by recent advances in the coin-betting framework (e.g., Jang et al. (2023)), our results are independent of any previous findings from regret guarantees of online gambling algorithms. Additionally, our newly derived MI-based bound recovers many previous results and improves our understanding of their potential limitations. Finally, we empirically compare various $f$-information measures for generalization, demonstrating the improvement of our new bounds over the previous bounds.
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is the generalization error estimation problem in machine learning, especially by introducing the conditional f - information framework to derive new generalization bounds. Specifically:
1. **Limitations of Existing Methods**:
- Traditional methods based on mutual information (MI) have made significant progress in dealing with non - convex optimization problems (such as deep learning), but they have some limitations. For example, when the mutual information measure is infinitely large, the actual generalization error may still be very small.
- Although the conditional mutual information (CMI) framework provides a bounded generalization bound, it is still not tight enough in some cases.
2. **Research Objectives**:
- Propose a general method to use the conditional f - information framework to derive generalization bounds, which is applicable to both bounded and unbounded loss functions.
- Do not rely on the upper bound of the cumulant - generating function (CGF) commonly used in traditional methods. Instead, simplify the proof process by choosing an appropriate measurable function to make the CGF equal to or less than zero.
- Explore the applications of different f - divergences (such as squared Hellinger distance, Jensen - Shannon divergence, etc.) in the derivation of generalization bounds to obtain tighter bounds.
3. **Main Contributions**:
- In the case of bounded loss differences, a new variational formula is proposed (see Lemma 3.1), and based on this, the generalization bound of conditional f - information is derived.
- For the KL - divergence case, the "oracle" CMI bound is provided (see Theorem 3.1), which recovers many previous CMI bounds and reveals the potential looseness of these bounds in the low - empirical - risk setting.
- Other generalization bounds based on f - information are introduced, including the looser χ² - information (see Theorem B.2) and the tighter squared Hellinger information (see Theorem 3.2) and Jensen - Shannon information (see Theorem 3.3).
- The framework is extended to the case of unbounded loss differences. An improved variational formula is proposed (see Lemma 4.1), and new f - information generalization bounds are provided in this case (see Theorem 4.1).
4. **Experimental Results**:
- Empirical results show that the newly proposed squared Hellinger information bound is superior to previous results.
Through these contributions, this paper aims to provide more accurate and extensive theoretical support for understanding the generalization performance of machine learning algorithms, especially when dealing with complex and high - dimensional data.