A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Namjoon Suh,Guang Cheng

2024-09-16

Abstract:In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression (and classification in Appendix~{\color{blue}B}). These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. Last but not least, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs) from two perpsectives reviewed previously, i.e., approximation and training dynamics.

Machine Learning,Statistics Theory

What problem does this paper attempt to address?

This paper attempts to comprehensively review the research on neural networks from the perspective of statistical theory, mainly focusing on three perspectives: approximation theory, training dynamics, and generative models. Specifically: 1. **Approximation theory perspective**: This part reviews the excess - risk results of neural networks in the non - parametric regression framework (the classification problem is discussed in Appendix B). These results depend on the specific construction of neural networks and can achieve a fast convergence rate of excess - risk. However, its underlying analysis is only applicable to the global minimum in the highly non - convex loss landscape of deep neural networks. This has prompted researchers to further explore the training dynamics of neural networks. 2. **Training dynamics perspective**: This part specifically focuses on "how neural networks trained by gradient methods find solutions that can generalize well to unseen data". Specifically, two well - known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm and the Mean - Field (MF) paradigm. 3. **Generative models**: The last part reviews the latest theoretical progress in the field of generative models, including Generative Adversarial Networks (GANs), Diffusion Models, and In - Context Learning (ICL) in Large Language Models (LLMs), and these studies are based on the first two perspectives (approximation theory and training dynamics). In summary, the main purpose of this paper is to comprehensively review and summarize the research achievements of neural networks in approximation theory, training dynamics, and generative models from the perspective of statistical theory, aiming to provide a systematic understanding and analysis framework.

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

Deep learning: a statistical viewpoint

Recent advances in deep learning theory

A Probabilistic Theory of Deep Learning

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

On the Overlooked Structure of Stochastic Gradients

Explaining generalization in deep learning: progress and fundamental limits

Applying statistical learning theory to deep learning

Envisioning Future Deep Learning Theories: Some Basic Concepts and Characteristics

The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

An Optimal Transport Analysis on Generalization in Deep Learning

Theory of Generative Deep Learning : Probe Landscape of Empirical Error via Norm Based Capacity Control

Stochastic Gradient Descent and Anomaly of Variance-flatness Relation in Artificial Neural Networks

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes.

Generative learning for nonlinear dynamics

A State-of-the-Art Survey on Deep Learning Theory and Architectures

Neural Network Approximations of Compositional Functions With Applications to Dynamical Systems