Abstract:We discuss several aspects of the loss landscape of regularized neural networks: the structure of stationary points, connectivity of optimal solutions, path with nonincreasing loss to arbitrary global optimum, and the nonuniqueness of optimal solutions, by casting the problem into an equivalent convex problem and considering its dual. Starting from two-layer neural networks with scalar output, we first characterize the solution set of the convex problem using its dual and further characterize all stationary points. With the characterization, we show that the topology of the global optima goes through a phase transition as the width of the network changes, and construct counterexamples where the problem may have a continuum of optimal solutions. Finally, we show that the solution set characterization and connectivity results can be extended to different architectures, including two-layer vector-valued neural networks and parallel three-layer neural networks.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are related to multiple aspects of regularizing the loss landscape of neural networks, including: 1. **Structure of stationary points**: Study the properties of the stationary points of the regularized neural network loss function. 2. **Connectivity of optimal solutions**: Explore the connected paths between different optimal solutions, that is, whether there exists a path that can connect any two global optimal solutions. 3. **Non - increasing loss paths**: Look for paths from the initial point to any global optimal solution, and the loss values on this path are non - increasing. 4. **Non - uniqueness of optimal solutions**: Analyze the non - uniqueness of optimal solutions in regularized neural networks, that is, whether there exist multiple different optimal solutions. To study these problems, the author transforms the problem into an equivalent convex optimization problem and uses its dual problem for analysis. Specifically, the main contributions of the paper include: - **Optimal polytope**: Re - examine the fact that the convex reformulation of the regularized neural network has a polytope as the optimal set, and establish the connection between the dual optimal solutions and the polytope. - **Ladder of connectivity**: For a two - layer neural network, when the network width changes, the topological structure of the optimal solution set will undergo a phase - transition behavior, and the specific critical width is given. - **Non - unique minimum - norm interpolator**: Study the situation where the minimum - norm interpolation problem may have an infinite number of optimal solutions under certain conditions, and construct a specific counterexample. In addition, the paper also extends these results to different network architectures, such as two - layer vector - valued neural networks and parallel three - layer neural networks. These studies are not only helpful for theoretical understanding but also provide guidance for the design of practical algorithms. For example, by characterizing the optimal solution set, an algorithm for searching for neural networks with the same optimal cost can be designed. ### Formula examples - **Dual problem of convex optimization problem**: \[ \max_{\|\nu\|_2 \leq 1}-L^*(\nu)\quad\text{s.t.}\quad|\nu^T(Xu)_+|\leq\beta \] where \(L^*\) is the convex conjugate function of \(L(\cdot, y)\), and \(\nu\) is the dual variable. - **Definition of optimal polytope**: \[ P^*_{\nu^*}=\left\{(c_i\bar{u}_i, d_i\bar{v}_i)_{i = 1}^P\mid c_i, d_i\geq0\forall i\in[P], \sum_{i = 1}^P D_iX\bar{u}_ic_i - D_iX\bar{v}_id_i=y^*\right\}\subseteq\mathbb{R}^{2dP} \] where \(y^*=\sum_{i = 1}^P D_iX(u_i^*-v_i^*)\) is the optimal model fit. - **Ladder of connectivity**: \[ P^*(m)=\left\{(u_i, v_i)_{i = 1}^P\mid(u_i, v_i)_{i = 1}^P\in P^*, \sum_{i = 1}^P1(u_i\neq0)+1(v_i\neq0)\leq m\right\}\subseteq\mathbb{R}^{2dP} \] These formulas and concepts together form the core content of the paper, helping readers better understand the loss landscape of regularized neural networks and their properties.

Exploring the loss landscape of regularized neural networks via convex duality

The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

The loss landscape of deep linear neural networks: a second-order analysis

The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens

Emergent properties of the local geometry of neural loss landscapes

Visualizing the Loss Landscape of Neural Nets

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

Lossless Convexification and Duality

Deep Loss Convexification for Learning Iterative Models

Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks

The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks With Soft-Thresholding

Exploring the Geometry and Topology of Neural Network Loss Landscapes

Strong Duality Relations in Nonconvex Risk-Constrained Learning

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

Breaking the Curse of Dimensionality with Convex Neural Networks

Embedding Principle of Loss Landscape of Deep Neural Networks

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

On the curvature of the loss landscape

Black holes and the loss landscape in machine learning