Exploring the loss landscape of regularized neural networks via convex duality

Sungyoon Kim,Aaron Mishkin,Mert Pilanci
2024-11-12
Abstract:We discuss several aspects of the loss landscape of regularized neural networks: the structure of stationary points, connectivity of optimal solutions, path with nonincreasing loss to arbitrary global optimum, and the nonuniqueness of optimal solutions, by casting the problem into an equivalent convex problem and considering its dual. Starting from two-layer neural networks with scalar output, we first characterize the solution set of the convex problem using its dual and further characterize all stationary points. With the characterization, we show that the topology of the global optima goes through a phase transition as the width of the network changes, and construct counterexamples where the problem may have a continuum of optimal solutions. Finally, we show that the solution set characterization and connectivity results can be extended to different architectures, including two-layer vector-valued neural networks and parallel three-layer neural networks.
Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve are related to multiple aspects of regularizing the loss landscape of neural networks, including: 1. **Structure of stationary points**: Study the properties of the stationary points of the regularized neural network loss function. 2. **Connectivity of optimal solutions**: Explore the connected paths between different optimal solutions, that is, whether there exists a path that can connect any two global optimal solutions. 3. **Non - increasing loss paths**: Look for paths from the initial point to any global optimal solution, and the loss values on this path are non - increasing. 4. **Non - uniqueness of optimal solutions**: Analyze the non - uniqueness of optimal solutions in regularized neural networks, that is, whether there exist multiple different optimal solutions. To study these problems, the author transforms the problem into an equivalent convex optimization problem and uses its dual problem for analysis. Specifically, the main contributions of the paper include: - **Optimal polytope**: Re - examine the fact that the convex reformulation of the regularized neural network has a polytope as the optimal set, and establish the connection between the dual optimal solutions and the polytope. - **Ladder of connectivity**: For a two - layer neural network, when the network width changes, the topological structure of the optimal solution set will undergo a phase - transition behavior, and the specific critical width is given. - **Non - unique minimum - norm interpolator**: Study the situation where the minimum - norm interpolation problem may have an infinite number of optimal solutions under certain conditions, and construct a specific counterexample. In addition, the paper also extends these results to different network architectures, such as two - layer vector - valued neural networks and parallel three - layer neural networks. These studies are not only helpful for theoretical understanding but also provide guidance for the design of practical algorithms. For example, by characterizing the optimal solution set, an algorithm for searching for neural networks with the same optimal cost can be designed. ### Formula examples - **Dual problem of convex optimization problem**: \[ \max_{\|\nu\|_2 \leq 1}-L^*(\nu)\quad\text{s.t.}\quad|\nu^T(Xu)_+|\leq\beta \] where \(L^*\) is the convex conjugate function of \(L(\cdot, y)\), and \(\nu\) is the dual variable. - **Definition of optimal polytope**: \[ P^*_{\nu^*}=\left\{(c_i\bar{u}_i, d_i\bar{v}_i)_{i = 1}^P\mid c_i, d_i\geq0\forall i\in[P], \sum_{i = 1}^P D_iX\bar{u}_ic_i - D_iX\bar{v}_id_i=y^*\right\}\subseteq\mathbb{R}^{2dP} \] where \(y^*=\sum_{i = 1}^P D_iX(u_i^*-v_i^*)\) is the optimal model fit. - **Ladder of connectivity**: \[ P^*(m)=\left\{(u_i, v_i)_{i = 1}^P\mid(u_i, v_i)_{i = 1}^P\in P^*, \sum_{i = 1}^P1(u_i\neq0)+1(v_i\neq0)\leq m\right\}\subseteq\mathbb{R}^{2dP} \] These formulas and concepts together form the core content of the paper, helping readers better understand the loss landscape of regularized neural networks and their properties.