GSdyn: Learning training dynamics via online Gaussian optimization with gradient states

Haoran Liao,Junchi Yan,Zimin Feng
2021-05-04
Abstract:Bayesian optimization, whose efficiency for automatic hyperparameter tuning has been verified over the decade, still faces a standing dilemma between massive consumption of time and suboptimal search results. Although much effort has been devoted to accelerate and improve the optimizer, the dominantly time-consuming step of evaluation receives relatively less attention. In this paper, we propose a novel online Bayesian algorithm, which optimizes hyperparameters and learns the training dynamics to make it free from the repeated complete evaluations. To solve the non-stationary problem i.e. the same hyperparameters will lead to varying results at different training steps, we combine the training loss and the dominant eigenvalue to track training dynamics. Compared to traditional algorithms, it saves time and utilizes the important intermediate information which are not well leveraged by classical Bayesian methods that only focus on the final results. The performance on CIFAR-10 and CIFAR-100 verifies the efficacy of our approach.
Computer Science
What problem does this paper attempt to address?