Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

Yahong Yang,Juncai He

2024-05-12

Abstract:Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.

Machine Learning,Numerical Analysis

What problem does this paper attempt to address?

This paper mainly discusses the optimization of generalization error between Deep Neural Networks (DeNNs) and Wide Neural Networks (WeNNs) when using the Sobolev loss function. The study found that the architecture of neural networks is influenced by factors such as the number of sample points, network parameters, and the regularity of the loss function. When the number of parameters is large, wide neural networks have an advantage; while the performance of deep neural networks is better when the number of sample points increases and the regularity of the loss function improves. The paper points out through theoretical analysis that wide neural networks perform well in tasks that require a shallower architecture, while deep neural networks excel in handling complex representations and computations. Deep neural networks can achieve high-precision approximation even with fewer parameters, which is called "overfitting". However, as the network depth increases, training becomes more complex and requires more sample points to capture the complex relationships in the data. The aim of the paper is to comprehensively analyze the optimal generalization error of these two types of neural networks under the Sobolev training setting. It decomposes the generalization error into approximation error and sampling error, and compares the performance of DeNNs and WeNNs under different loss functions. The results show that when there are ample sample points but limited parameters, DeNNs outperform WeNNs; on the other hand, if there are more parameters but limited sample points, WeNNs are more suitable. In addition, as the derivative order increases in the loss function, the advantage region of DeNNs expands. In summary, this paper provides guidance for selecting the appropriate neural network architecture, emphasizing the importance of balancing depth and width based on the available number of sample points and parameters.

Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes.

Going Deeper, Generalizing Better: an Information-Theoretic View for Deep Learning.

DeepONet for Solving PDEs: Generalization Analysis in Sobolev Training

Information-Theoretic Generalization Bounds for Deep Neural Networks

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

On the Depth of Deep Neural Networks: A Theoretical View

PDE Models for Deep Neural Networks: Learning Theory, Calculus of Variations and Optimal Control

Understanding Loss Landscapes of Neural Network Models in Solving Partial Differential Equations

Deep Operator Learning Lessens the Curse of Dimensionality for PDEs

Constructing Infinite Deep Neural Networks with Flexible Expressiveness While Training

Embedding Principle of Loss Landscape of Deep Neural Networks

Theory IIIb: Generalization in Deep Networks

Optimally weighted loss functions for solving PDEs with Neural Networks

Interplay between depth and width for interpolation in neural ODEs

Neural Ordinary Differential Equations with Envolutionary Weights

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

Generalization and Expressivity for Deep Nets

An Optimal Transport Analysis on Generalization in Deep Learning