Abstract:We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: to reveal the structure of deep neural networks (DNNs) through convex duality theory, especially the characteristics of hidden - layer weights in the problem of regularized deep neural network training. Specifically, the main objectives of the paper include: 1. **Introducing a convex analysis framework**: In order to characterize a set of optimal solutions to the regularized deep neural network training problem, the author introduced an analysis framework based on the extreme points of convex sets. This makes it possible to explicitly find the optimal hidden - layer weights. 2. **Proving the weight alignment of deep linear networks**: For special deep linear networks, the author proved that each optimal weight matrix is aligned with the previous layer through convex duality. This result provides important insights into understanding the structure of deep linear networks. 3. **Extending to ReLU networks**: The author applied the same characterization method to deep ReLU networks with whitened data and proved a similar weight - alignment phenomenon. In addition, they also proved that regularized deep ReLU networks will produce spline interpolation when processing one - dimensional data sets, which was a result previously limited to two - layer networks. 4. **Providing closed - form solutions**: When the data is rank - one or whitened, the author provided closed - form solutions for the optimal layer weights. These results are not only applicable to specific data conditions, but can also be generalized to architectures including batch normalization, even for arbitrary data. 5. **Explaining the neural collapse phenomenon**: The author used their theoretical framework to explain the recently observed "neural collapse" phenomenon, that is, the class means collapse to the vertices of the simplex equiangular tight frame. This phenomenon has been widely observed in practice, but the underlying theoretical mechanism has not been fully understood. ### Summary of main contributions - Introduced a convex analysis framework to characterize a set of optimal solutions to the regularized training problem. - Proved the weight - alignment phenomenon in deep linear networks and ReLU networks. - Provided closed - form solutions for the optimal layer weights under rank - one or whitened data conditions. - Explained the neural collapse phenomenon, that is, the class means collapse to the vertices of the simplex equiangular tight frame. Through these contributions, the paper provides an important theoretical basis for understanding and optimizing the training process of deep neural networks.

Revealing the Structure of Deep Neural Networks via Convex Duality

Exploring the loss landscape of regularized neural networks via convex duality

From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity

The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

Breaking the Curse of Dimensionality with Convex Neural Networks

Black Boxes and Looking Glasses: Multilevel Symmetries, Reflection Planes, and Convex Optimization in Deep Networks

Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

Explicitising The Implicit Intrepretability of Deep Neural Networks Via Duality

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal?

How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features

Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data

Modular Duality in Deep Learning

The Geometric Structure of Fully-Connected ReLU Layers

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds

A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks With Soft-Thresholding