Revealing the Structure of Deep Neural Networks via Convex Duality

Tolga Ergen,Mert Pilanci
DOI: https://doi.org/10.48550/arXiv.2002.09773
2021-06-12
Abstract:We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: to reveal the structure of deep neural networks (DNNs) through convex duality theory, especially the characteristics of hidden - layer weights in the problem of regularized deep neural network training. Specifically, the main objectives of the paper include: 1. **Introducing a convex analysis framework**: In order to characterize a set of optimal solutions to the regularized deep neural network training problem, the author introduced an analysis framework based on the extreme points of convex sets. This makes it possible to explicitly find the optimal hidden - layer weights. 2. **Proving the weight alignment of deep linear networks**: For special deep linear networks, the author proved that each optimal weight matrix is aligned with the previous layer through convex duality. This result provides important insights into understanding the structure of deep linear networks. 3. **Extending to ReLU networks**: The author applied the same characterization method to deep ReLU networks with whitened data and proved a similar weight - alignment phenomenon. In addition, they also proved that regularized deep ReLU networks will produce spline interpolation when processing one - dimensional data sets, which was a result previously limited to two - layer networks. 4. **Providing closed - form solutions**: When the data is rank - one or whitened, the author provided closed - form solutions for the optimal layer weights. These results are not only applicable to specific data conditions, but can also be generalized to architectures including batch normalization, even for arbitrary data. 5. **Explaining the neural collapse phenomenon**: The author used their theoretical framework to explain the recently observed "neural collapse" phenomenon, that is, the class means collapse to the vertices of the simplex equiangular tight frame. This phenomenon has been widely observed in practice, but the underlying theoretical mechanism has not been fully understood. ### Summary of main contributions - Introduced a convex analysis framework to characterize a set of optimal solutions to the regularized training problem. - Proved the weight - alignment phenomenon in deep linear networks and ReLU networks. - Provided closed - form solutions for the optimal layer weights under rank - one or whitened data conditions. - Explained the neural collapse phenomenon, that is, the class means collapse to the vertices of the simplex equiangular tight frame. Through these contributions, the paper provides an important theoretical basis for understanding and optimizing the training process of deep neural networks.