Fast Rate Generalization Error Bounds: Variations on a Theme.

Xuetong Wu,Jonathan H. Manton,Uwe Aickelin,Jingge Zhu
DOI: https://doi.org/10.1109/itw54588.2022.9965761
2022-01-01
Abstract:A recent line of works, initiated by [1] and [2], has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of $O(\sqrt {\lambda I/n} )$ where λ is an assumption-dependent coefficient and I is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1 /n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound by evaluating λ under an appropriate assumption. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the ( η, c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O (1 /n) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.
What problem does this paper attempt to address?