Abstract:SIGNSGD is able to dramatically improve the performance of training large neural networks by transmitting the sign of each minibatch stochastic gradient, which achieves gradient communication compression and keeps standard stochastic gradient descent (SGD) level convergence rate. Meanwhile, the learning rate plays a vital role in training neural networks, but existing learning rate optimization strategies mainly face the following problems: (1) for learning rate decay method, small learning rates produced lead to converge slowly, and extra hyper-parameters are required except for the initial learning rate, causing more human participation. (2) Adaptive gradient algorithms have poor generalization performance and also utilize other hyper-parameters. (3) Generating learning rates via two-level optimization models is difficult and time-consuming in training. To this end, we propose a novel adaptive learning rate schedule for neural network training via SIGNSGD optimizer for the first time. In our method, based on the theoretical inspiration that the convergence rate’s upper bound has minimization with the current learning rate in each iteration, the current learning rate can be expressed by a mathematical expression that is merely related to historical learning rates. Then, given an initial value, learning rates in different training stages can be adaptively obtained. Our proposed method has following advantages: (1) it is a novel automatic method without additional hyper-parameters except for one initial value, thus reducing the manual participation. (2) It has faster convergence rate and outperforms the standard SGD. (3) It makes neural networks achieve better performance with fewer gradient communication bits. Three numerical simulations are conducted on different neural networks with three public datasets: MNIST, Cifar-10 and Cifar-100 datasets, and several numerical results are presented to demonstrate the efficiency of our proposed approach.

Optimal Linear Decay Learning Rate Schedules and Further Refinements

How to decay your learning rate

An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

Locally Optimal Descent for Dynamic Stepsize Scheduling

On Learning Rates and Schrödinger Operators

Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

Dynamic Learning Rate Decay for Stochastic Variational Inference

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks

Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training

An Adaptive Mechanism to Achieve Learning Rate Dynamically

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

The Road Less Scheduled

Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

An optimization Strategy for Deep Neural Networks Training

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Learning Stages: Phenomenon, Root Cause, Mechanism Hypothesis, and Implications.

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize

Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints

Harnessing Learn Rate Schedule for Adaptive Deep Learning in LoRaWAN-IoT Localization