PANDA: Population Automatic Neural Distributed Algorithm for Deep Leaning

Jia Wei,Xingjun Zhang,Zeyu Ji,Jingbo Li,Zheng Wei
DOI: https://doi.org/10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00187
2021-01-01
Abstract:Deep neural network models perform very brightly in the field of artificial intelligence, but their success is affected by hyperparameters, and the learning rate schedule is one of the most important hyperparameters, and the search for the learning rate schedule is often time-consuming and computationally resource-intensive. In this paper, we propose a Population Automatic Neural Distributed Algorithm (PANDA) based on population joint optimization, which uses distributed data parallel deep neural network training to implement a dynamic learning rate schedule optimization strategy based on the population idea, with almost no loss of test accuracy, PANDA is able to dynamically refine the learning rate schedule during model training instead of following the usual suboptimal strategy. We conducted experiments on typical AlexNet, VGG16, and ResNet18 using the Tianhe-3 supercomputing prototype, and the results show that using PANDA to dynamically update the learning rate greatly reduces the learning rate schedule search time while ensuring close performance with the latest population hyperparameter algorithm, and in our experiments, PANDA lead to at max 123.85x speedup, and the experimental results prove the effectiveness and robustness of PANDA.
What problem does this paper attempt to address?