Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

Yahong Yang,Juncai He
2024-05-12
Abstract:Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.
Machine Learning,Numerical Analysis
What problem does this paper attempt to address?
This paper mainly discusses the optimization of generalization error between Deep Neural Networks (DeNNs) and Wide Neural Networks (WeNNs) when using the Sobolev loss function. The study found that the architecture of neural networks is influenced by factors such as the number of sample points, network parameters, and the regularity of the loss function. When the number of parameters is large, wide neural networks have an advantage; while the performance of deep neural networks is better when the number of sample points increases and the regularity of the loss function improves. The paper points out through theoretical analysis that wide neural networks perform well in tasks that require a shallower architecture, while deep neural networks excel in handling complex representations and computations. Deep neural networks can achieve high-precision approximation even with fewer parameters, which is called "overfitting". However, as the network depth increases, training becomes more complex and requires more sample points to capture the complex relationships in the data. The aim of the paper is to comprehensively analyze the optimal generalization error of these two types of neural networks under the Sobolev training setting. It decomposes the generalization error into approximation error and sampling error, and compares the performance of DeNNs and WeNNs under different loss functions. The results show that when there are ample sample points but limited parameters, DeNNs outperform WeNNs; on the other hand, if there are more parameters but limited sample points, WeNNs are more suitable. In addition, as the derivative order increases in the loss function, the advantage region of DeNNs expands. In summary, this paper provides guidance for selecting the appropriate neural network architecture, emphasizing the importance of balancing depth and width based on the available number of sample points and parameters.