CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks

Miria Feng,Zachary Frangella,Mert Pilanci
2024-11-02
Abstract:We introduce the CRONOS algorithm for convex optimization of two-layer neural networks. CRONOS is the first algorithm capable of scaling to high-dimensional datasets such as ImageNet, which are ubiquitous in modern deep learning. This significantly improves upon prior work, which has been restricted to downsampled versions of MNIST and CIFAR-10. Taking CRONOS as a primitive, we then develop a new algorithm called CRONOS-AM, which combines CRONOS with alternating minimization, to obtain an algorithm capable of training multi-layer networks with arbitrary architectures. Our theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions. In addition, we validate the efficacy of CRONOS and CRONOS-AM through extensive large-scale numerical experiments with GPU acceleration in JAX. Our results show that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb. To the best of our knowledge, CRONOS is the first algorithm which utilizes the convex reformulation to enhance performance on large-scale learning tasks.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the optimization challenges faced when training deep neural networks on large - scale datasets. Specifically: 1. **Non - convex Optimization Challenges**: Training deep neural networks (DNNs) usually involves non - convex optimization problems, which makes it very difficult to find the global optimal solution. Existing stochastic first - order optimizers (such as SGD, Adam, etc.) can only guarantee to find approximate stationary points, and these points may be far from optimal. 2. **Hyper - parameter Tuning**: Current optimization methods (such as SGD, Adam, etc.) require a large amount of hyper - parameter tuning, which is not only time - consuming but also performs inconsistently on different tasks and datasets. As the model scale increases, the number of hyper - parameters also increases, further exacerbating this problem. 3. **Processing Capacity for Large - scale Datasets**: Although existing convex optimization methods have strong convergence guarantees in theory, they are often difficult to handle large - scale datasets (such as ImageNet) in practical applications because these methods have high computational complexity on high - dimensional data. To solve the above problems, the paper proposes the CRONOS algorithm and its extended CRONOS - AM algorithm. CRONOS transforms the training problem of a two - layer ReLU neural network into a convex optimization problem and uses the Alternating Direction Method of Multipliers (ADMM) to solve it efficiently. CRONOS - AM further extends this method so that it can handle multi - layer neural networks. These methods not only have global convergence guarantees in theory but also perform excellently in practical applications and can achieve performance equivalent to or better than existing optimizers on large - scale datasets.