Abstract:Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters. Recently, it has been shown that the trajectories of iterative optimization algorithms can possess fractal structures, and their generalization error can be formally linked to the complexity of such fractals. This complexity is measured by the fractal's intrinsic dimension, a quantity usually much smaller than the number of parameters in the network. Even though this perspective provides an explanation for why overparametrized networks would not overfit, computing the intrinsic dimension (e.g., for monitoring generalization during training) is a notoriously difficult task, where existing methods typically fail even in moderate ambient dimensions. In this study, we consider this problem from the lens of topological data analysis (TDA) and develop a generic computational tool that is built on rigorous mathematical foundations. By making a novel connection between learning theory and TDA, we first illustrate that the generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD), where, compared with prior work, our approach does not require any additional geometrical or statistical assumptions on the training dynamics. Then, by utilizing recently established theoretical results and TDA tools, we develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks and further provide visualization tools to help understand generalization in deep learning. Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings, which is predictive of the generalization error.

Optimal errors and phase transitions in high-dimensional generalized linear models

Generalization Error of Generalized Linear Models in High Dimensions

Statistical Inference for High-Dimensional Generalized Linear Models With Binary Outcomes

Generalisation error in learning with random features and the hidden manifold model

Phase transitions in nonparametric regressions

Generalization error of spectral algorithms

Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

Bayes-optimal Learning of Deep Random Networks of Extensive-width

Multilayer neural networks with extensively many hidden units

LINEAR HYPOTHESIS TESTING FOR HIGH DIMENSIONAL GENERALIZED LINEAR MODELS.

Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Information-Theoretic Generalization Bounds for Deep Neural Networks

Do highly over-parameterized neural networks generalize since bad solutions are rare?

Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets

Characterizing out-of-distribution generalization of neural networks: application to the disordered Su-Schrieffer-Heeger model

How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

The estimation error of general first order methods