Abstract:We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. In this work, we: (1) prove the widespread existence of suboptimal local minima in the loss landscape of neural networks, and we use our theory to find examples; (2) show that small-norm parameters are not optimal for generalization; (3) demonstrate that ResNets do not conform to wide-network theories, such as the neural tangent kernel, and that the interaction between skip connections and batch normalization plays a role; (4) find that rank does not correlate with generalization or robustness in a practical setting.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to evaluate and verify whether several widely - accepted hypotheses about neural networks hold in practical applications. Specifically, the author focuses on the following four aspects of problems: 1. **Local Minima**: - Theoretically, many studies believe that all local minima of the neural network loss function are globally optimal or nearly optimal. However, the author has found sub - optimal local minima in the actual neural network loss function and explored the reasons for the existence of these local minima on the deep neural network loss surface. 2. **Weight Decay and Parameter Norms**: - Research inspired by Tikhonov regularization shows that low - norm solutions usually have better generalization ability, which provides an intuitive reason for simple regularization methods such as weight decay. But the author has found through experiments that for modern architectures, solutions biased towards non - zero norms are still effective and can even improve performance. 3. **Neural Tangent Kernels and the Wide - Network Limit**: - Theoretical research shows that in the wide - network limit, the neural tangent kernel (NTK) remains almost unchanged, and the training dynamics of the neural network can be described as gradient descent on a convex function. However, the author has found that these theoretical predictions do not apply to actual networks, especially the ResNet architecture. They have shown the impact of the interaction between skip connections and batch normalization on this trend. 4. **Rank**: - Generalization theory provides guarantees for the performance of low - rank networks. However, the author has found that regularization methods that encourage high - rank weight matrices are often superior to methods that promote low - rank matrices. This indicates that in actual networks, the low - rank structure is not a key factor in generalization ability. In addition, they have also studied the adversarial robustness of low - rank networks and found that its robustness is usually lower than that of the baseline or specially - made high - rank networks. Overall, through empirical research on these hypotheses, this paper reveals the gap between theory and practice and provides new insights into understanding the actual behavior of deep learning.

Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory

Empirical Tests of Optimization Assumptions in Deep Learning

The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning

A Study of the Mathematics of Deep Learning

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes.

Theory IIIb: Generalization in Deep Networks

A Probabilistic Theory of Deep Learning

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients

Random matrix theory and the loss surfaces of neural networks

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

Understanding deep learning (still) requires rethinking generalization

The Search for Sparse, Robust Neural Networks

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

On the Depth of Deep Neural Networks: A Theoretical View

Deep learning: a statistical viewpoint

The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence

Visualizing the Loss Landscape of Neural Nets

Understanding deep learning requires rethinking generalization

Theory of Generative Deep Learning : Probe Landscape of Empirical Error via Norm Based Capacity Control

Going Deeper, Generalizing Better: an Information-Theoretic View for Deep Learning.

Large Margin Deep Neural Networks: Theory and Algorithms.