Recent advances in deep learning theory

Fengxiang He,Dacheng Tao
DOI: https://doi.org/10.48550/arXiv.2012.10931
2021-03-11
Abstract:Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.
Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is the lack of a solid theoretical foundation for deep learning. Specifically: 1. **Lack of a solid theoretical foundation**: Despite the tremendous success of deep learning in many practical applications, many aspects of its mechanisms remain unknown. Many existing intuitions and heuristic methods, while achieving excellent performance, can also be very unstable in some cases. This "black box" nature brings unknown risks to the application of deep learning, especially in fields with high safety requirements (such as autonomous driving, medical diagnosis, and drug discovery). 2. **Theoretical explanation of generalization ability**: Generalization ability refers to whether a model can perform well on unseen data when it performs well on training data. Since training data cannot cover all future situations, good generalization ability is crucial for handling unseen events, especially when long-tail events can trigger fatal disasters. 3. **Limitations of traditional statistical learning theory tools**: Traditional statistical learning theory tools (such as VC dimension, Rademacher complexity, etc.) usually establish generalization bounds based on the complexity of the hypothesis space. However, these tools are powerless when dealing with deep learning models because the number of parameters in deep learning models is usually very large, making the generalization bounds vacuous. 4. **Impact of over-parameterization**: The over-parameterization of deep neural networks (i.e., the number of model parameters far exceeds the number of training samples) is a major obstacle to establishing meaningful generalization bounds. However, recent research has shown that over-parameterization can bring benefits in some cases, such as making the loss function surface smoother and even nearly convex. To address these issues, the paper reviews and organizes recent advances in deep learning theory, dividing them into six main categories: 1. **Complexity and capacity methods**: Analyzing the generalization ability of deep learning. 2. **Stochastic differential equations**: Modeling stochastic gradient descent and its variants, studying optimization and generalization. 3. **Geometric structure of the loss landscape**: Studying the trajectories of dynamical systems. 4. **Role of over-parameterization**: Exploring the impact of over-parameterization on deep neural networks from both positive and negative perspectives. 5. **Theoretical foundations of specific network architectures**: Including convolutional neural networks, recurrent neural networks, etc. 6. **Ethical and safety issues**: Exploring the relationship with generalization ability. Through these classifications, the paper aims to provide a comprehensive review of the development of deep learning theory and offer new insights for future research.