Abstract:Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to refine the training process of neural networks and provide a better interpretation of their optimal weights. We focus on training two-layer neural networks with piecewise linear activations and demonstrate that they can be formulated as a finite-dimensional convex program. These programs include a regularization term that promotes sparsity, which constitutes a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite width neural networks and then we express these architectures equivalently as high dimensional convex sparse recovery models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. We also show that all the stationary of the nonconvex training objective can be characterized as the global optimum of a subsampled convex program. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters.

Nonconvex Sparse Regularization for Deep Neural Networks and Its Optimality

Convergence of Projected Subgradient Method with Sparse or Low-Rank Constraints

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Sparse-penalized deep neural networks estimator under weak dependence

Robust Sparse Recovery Via Non-Convex Optimization

A Non-convex Approach for Sparse Recovery with Convergence Guarantee

High-dimensional Inference Via Lipschitz Sparsity-Yielding Regularizers.

Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization Approach

Nonconvex Sparse Representation With Slowly Vanishing Gradient Regularizers

Scaled minimax optimality in high-dimensional linear regression: A non-convex algorithmic regularization approach

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

The Convergence Guarantees of a Non-Convex Approach for Sparse Recovery.

Improving Spiking Sparse Recovery via Non-Convex Penalties

Nonparametric regression using over-parameterized shallow ReLU neural networks

Nonconvex Sparse Logistic Regression with Weakly Convex Regularization

The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

Statistical learning by sparse deep neural networks

A Semismooth Newton Algorithm for High-Dimensional Nonconvex Sparse Learning

Efficient Sparse Recovery Via Adaptive Non-Convex Regularizers with Oracle Property

Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution