What problem does this paper attempt to address?

The problem that this paper attempts to solve is related to the connectivity of sublevel sets of the training loss function in deep neural networks. Specifically, the author explores whether the sublevel sets remain connected under different widths (i.e., the number of neurons), and tries to find the minimum width condition to ensure connectivity. ### Background of the Paper In deep learning, the loss landscape of a neural network is crucial for understanding the optimization process. A sublevel set refers to the set of all parameters such that the value of the loss function does not exceed a given threshold. If these sublevel sets are connected, it means: 1. There are no "bad" local valleys on the loss surface, that is, no local minima will cause the optimization to get stuck. 2. All global minima are located within a unique global valley, which means that there is a continuous path from any global minimum to another global minimum. ### Main Contributions In this paper, the author improves previous research results, improving the width condition for ensuring the connectivity of sublevel sets from \(2N\) to \(N + 1\), where \(N\) is the number of training samples. Specifically: - **Deep Architecture**: If the width of the first layer is at least \(N+1\) and other assumptions hold, then each sublevel set is connected. - **Two - layer Network**: The author also proves that in a two - layer network, if the width of the first layer is \(N\) (i.e., one neuron less than \(N + 1\)), then the sublevel set may not be connected. This shows that \(N+1\) is the tightest condition to ensure connectivity, unless additional assumptions are made on the data or the network. ### Mathematical Formulas Let \(N\) be the number of training samples, \(\theta=(W_{l},b_{l})_{l = 1}^{L}\) be the network parameters, and \(\Phi(\theta)\) be the training loss function. The sublevel set is defined as: \[L_{\alpha}=\{\theta\mid\Phi(\theta)\leq\alpha\}\] ### Conclusion Through this research, the author not only improves the existing theoretical results but also reveals the relationship between the width of the neural network and the connectivity of the sublevel sets of the loss function. This is of great significance for understanding the optimization problems in deep learning, especially providing a theoretical basis for designing more effective optimization algorithms and network structures.

A Note on Connectivity of Sublevel Sets in Deep Learning

On Connected Sublevel Sets in Deep Learning

How many degrees of freedom do we need to train deep networks: a loss landscape perspective

Subdomain contraction in deep networks for robust representation learning

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

A theoretical framework for deep locally connected ReLU network

Functional Network: A Novel Framework for Interpretability of Deep Neural Networks

Is the Skip Connection Provable to Reform the Neural Network Loss Landscape?

Deep Limits of Residual Neural Networks

Width and Depth Limits Commute in Residual Networks

A note about why deep learning is deep: A discontinuous approximation perspective

Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization

New advances in universal approximation with neural networks of minimal width

Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function

Disentangling Trainability and Generalization in Deep Neural Networks

An Information-Theoretic Framework for Supervised Learning

Does SGD really happen in tiny subspaces?

Going Beyond Linear Mode Connectivity: the Layerwise Linear Feature Connectivity.

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Disentangling Linear Mode-Connectivity