Abstract:Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning theory. In this paper, we obtain generalization error bounds for learning general non-convex objectives, which has attracted significant attention in recent years. We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability. Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a). We also study the setting where the total loss is the sum of a bounded loss and an additional ℓ_2 regularization term. We obtain new generalization bounds for the continuous Langevin dynamic in this setting by developing a new Log-Sobolev inequality for the parameter distribution at any time. Our new bounds are more desirable when the noisy level of the process is not small, and do not become vacuous even when T tends to infinity.

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Generalization error of spectral algorithms

On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Rényi’s Entropy Perspective

Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Renyi's Entropy Perspective

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Generalization in Kernel Regression Under Realistic Assumptions

Spectral algorithms for functional linear regression

Generalization Error Analysis of Neural networks with Gradient Based Regularization

Automated Spectral Kernel Learning

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Kernel interpolation generalizes poorly

Generalization for Least Squares Regression With Simple Spiked Covariances

Error Analysis of Kernel/GP Methods for Nonlinear and Parametric PDEs

Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

Generalization Error of Generalized Linear Models in High Dimensions

Kernel regression, minimax rates and effective dimensionality: beyond the regular case

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains