Abstract:SIGNSGD is able to dramatically improve the performance of training large neural networks by transmitting the sign of each minibatch stochastic gradient, which achieves gradient communication compression and keeps standard stochastic gradient descent (SGD) level convergence rate. Meanwhile, the learning rate plays a vital role in training neural networks, but existing learning rate optimization strategies mainly face the following problems: (1) for learning rate decay method, small learning rates produced lead to converge slowly, and extra hyper-parameters are required except for the initial learning rate, causing more human participation. (2) Adaptive gradient algorithms have poor generalization performance and also utilize other hyper-parameters. (3) Generating learning rates via two-level optimization models is difficult and time-consuming in training. To this end, we propose a novel adaptive learning rate schedule for neural network training via SIGNSGD optimizer for the first time. In our method, based on the theoretical inspiration that the convergence rate’s upper bound has minimization with the current learning rate in each iteration, the current learning rate can be expressed by a mathematical expression that is merely related to historical learning rates. Then, given an initial value, learning rates in different training stages can be adaptively obtained. Our proposed method has following advantages: (1) it is a novel automatic method without additional hyper-parameters except for one initial value, thus reducing the manual participation. (2) It has faster convergence rate and outperforms the standard SGD. (3) It makes neural networks achieve better performance with fewer gradient communication bits. Three numerical simulations are conducted on different neural networks with three public datasets: MNIST, Cifar-10 and Cifar-100 datasets, and several numerical results are presented to demonstrate the efficiency of our proposed approach.

PANDA: Population Automatic Neural Distributed Algorithm for Deep Leaning

DPLRS: Distributed Population Learning Rate Schedule

Leader Population Learning Rate Schedule

Efficient Hyperparameter Optimization with Probability-based Resource Allocating on Deep Neural Networks

Adaptive Distributed Parallel Training Method for a Deep Learning Model Based on Dynamic Critical Paths of DAG

An optimization Strategy for Deep Neural Networks Training

An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

PID Controller-Based Stochastic Optimization Acceleration for Deep Neural Networks

An Adaptive Mechanism to Achieve Learning Rate Dynamically

DALU: Adaptive Learning Rate Update in Distributed Deep Learning

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method

Aware: Adaptive Distributed Training with Computation, Communication and Position Awareness for Deep Learning Model.

PANDA: AdaPtive Noisy Data Augmentation for Regularization of Undirected Graphical Models

An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

ADPA Optimization for Real-Time Energy Management Using Deep Learning

An Optimal Network-Aware Scheduling Technique for Distributed Deep Learning in Distributed HPC Platforms

An Efficient Optimization Technique for Training Deep Neural Networks

Model-Aware Parallelization Strategy for Deep Neural Networks' Distributed Training

Adaptive learning rate optimization algorithms with dynamic bound based on Barzilai-Borwein method