Abstract:SIGNSGD is able to dramatically improve the performance of training large neural networks by transmitting the sign of each minibatch stochastic gradient, which achieves gradient communication compression and keeps standard stochastic gradient descent (SGD) level convergence rate. Meanwhile, the learning rate plays a vital role in training neural networks, but existing learning rate optimization strategies mainly face the following problems: (1) for learning rate decay method, small learning rates produced lead to converge slowly, and extra hyper-parameters are required except for the initial learning rate, causing more human participation. (2) Adaptive gradient algorithms have poor generalization performance and also utilize other hyper-parameters. (3) Generating learning rates via two-level optimization models is difficult and time-consuming in training. To this end, we propose a novel adaptive learning rate schedule for neural network training via SIGNSGD optimizer for the first time. In our method, based on the theoretical inspiration that the convergence rate’s upper bound has minimization with the current learning rate in each iteration, the current learning rate can be expressed by a mathematical expression that is merely related to historical learning rates. Then, given an initial value, learning rates in different training stages can be adaptively obtained. Our proposed method has following advantages: (1) it is a novel automatic method without additional hyper-parameters except for one initial value, thus reducing the manual participation. (2) It has faster convergence rate and outperforms the standard SGD. (3) It makes neural networks achieve better performance with fewer gradient communication bits. Three numerical simulations are conducted on different neural networks with three public datasets: MNIST, Cifar-10 and Cifar-100 datasets, and several numerical results are presented to demonstrate the efficiency of our proposed approach.

Online Learning for DNN Training: A Stochastic Block Adaptive Gradient Algorithm

Distributed Adaptive Subgradient Algorithms for Online Learning over Time-Varying Networks

Reasonable Gradients for Online Training Algorithms in Spiking Neural Networks

Stochastic Gradient Methods with Block Diagonal Matrix Adaptation

A Randomized Block-Coordinate Adam online learning optimization algorithm

ABNGrad: adaptive step size gradient descent for optimizing neural networks

Variance Reduced Diffusion Adaptation for Online Learning over Networks

NDOT: Neuronal Dynamics-based Online Training for Spiking Neural Networks

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

An Adaptive and Momental Bound Method for Stochastic Learning

Randomized Block-Coordinate Adaptive Algorithms for Nonconvex Optimization Problems

Asgd: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter

DP-RBAdaBound: A Differentially Private Randomized Block-Coordinate Adaptive Gradient Algorithm for Training Deep Neural Networks

Asynchronous SGD with Stale Gradient Dynamic Adjustment for Deep Learning Training

Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

Stochastic Average Gradient : A Simple Empirical Investigation

Adaptive Stochastic Conjugate Gradient Optimization for Backpropagation Neural Networks

Online Training Through Time for Spiking Neural Networks

An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks