A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Namjoon Suh,Guang Cheng
2024-09-16
Abstract:In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression (and classification in Appendix~{\color{blue}B}). These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. Last but not least, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs) from two perpsectives reviewed previously, i.e., approximation and training dynamics.
Machine Learning,Statistics Theory
What problem does this paper attempt to address?
This paper attempts to comprehensively review the research on neural networks from the perspective of statistical theory, mainly focusing on three perspectives: approximation theory, training dynamics, and generative models. Specifically: 1. **Approximation theory perspective**: This part reviews the excess - risk results of neural networks in the non - parametric regression framework (the classification problem is discussed in Appendix B). These results depend on the specific construction of neural networks and can achieve a fast convergence rate of excess - risk. However, its underlying analysis is only applicable to the global minimum in the highly non - convex loss landscape of deep neural networks. This has prompted researchers to further explore the training dynamics of neural networks. 2. **Training dynamics perspective**: This part specifically focuses on "how neural networks trained by gradient methods find solutions that can generalize well to unseen data". Specifically, two well - known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm and the Mean - Field (MF) paradigm. 3. **Generative models**: The last part reviews the latest theoretical progress in the field of generative models, including Generative Adversarial Networks (GANs), Diffusion Models, and In - Context Learning (ICL) in Large Language Models (LLMs), and these studies are based on the first two perspectives (approximation theory and training dynamics). In summary, the main purpose of this paper is to comprehensively review and summarize the research achievements of neural networks in approximation theory, training dynamics, and generative models from the perspective of statistical theory, aiming to provide a systematic understanding and analysis framework.