Abstract:SIGNSGD is able to dramatically improve the performance of training large neural networks by transmitting the sign of each minibatch stochastic gradient, which achieves gradient communication compression and keeps standard stochastic gradient descent (SGD) level convergence rate. Meanwhile, the learning rate plays a vital role in training neural networks, but existing learning rate optimization strategies mainly face the following problems: (1) for learning rate decay method, small learning rates produced lead to converge slowly, and extra hyper-parameters are required except for the initial learning rate, causing more human participation. (2) Adaptive gradient algorithms have poor generalization performance and also utilize other hyper-parameters. (3) Generating learning rates via two-level optimization models is difficult and time-consuming in training. To this end, we propose a novel adaptive learning rate schedule for neural network training via SIGNSGD optimizer for the first time. In our method, based on the theoretical inspiration that the convergence rate’s upper bound has minimization with the current learning rate in each iteration, the current learning rate can be expressed by a mathematical expression that is merely related to historical learning rates. Then, given an initial value, learning rates in different training stages can be adaptively obtained. Our proposed method has following advantages: (1) it is a novel automatic method without additional hyper-parameters except for one initial value, thus reducing the manual participation. (2) It has faster convergence rate and outperforms the standard SGD. (3) It makes neural networks achieve better performance with fewer gradient communication bits. Three numerical simulations are conducted on different neural networks with three public datasets: MNIST, Cifar-10 and Cifar-100 datasets, and several numerical results are presented to demonstrate the efficiency of our proposed approach.

Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent

Optimizing Quantized Neural Networks in a Weak Curvature Manifold

Second-order Neural Network Training Using Complex-step Directional Derivative

Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

Scheduling Optimization Techniques for Neural Network Training

Gradient Descent Optimization in Deep Learning Model Training Based on Multistage and Method Combination Strategy

Reconstructing Deep Neural Networks: Unleashing the Optimization Potential of Natural Gradient Descent

Exact Gauss-Newton Optimization for Training Deep Neural Networks

Block-diagonal Hessian-free Optimization for Training Neural Networks

An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks

Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Practical Quasi-Newton Methods for Training Deep Neural Networks

Learning To Optimize Quantum Neural Network Without Gradients

Distributed Newton Methods for Deep Neural Networks

Gradient Correction Beyond Gradient Descent

Stochastic Quasi-Newton Optimization in Large Dimensions Including Deep Network Training

Learning to Optimize Quasi-Newton Methods

A Novel Gradient Descent Optimizer based on Fractional Order Scheduler and its Application in Deep Neural Networks

PID Controller-Based Stochastic Optimization Acceleration for Deep Neural Networks

Automatic gradient descent with generalized Newton's method