Abstract:Algorithm selection as well as hyperparameter optimization are tedious task that have to be dealt with when applying machine learning to real-world problems. Sequential model-based optimization (SMBO), based on so-called “surrogate models”, has been employed to allow for faster and more direct hyperparameter optimization. A surrogate model is a machine learning regression model which is trained on the meta-level instances in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors. Gaussian processes, for example, make good surrogate models as they provide probability distributions over labels. Recent work on SMBO also includes meta-data, i.e. observed hyperparameter performances on other data sets, into the process of hyperparameter optimization. This can, for example, be accomplished by learning transfer surrogate models on all available instances of meta-knowledge; however, the increasing amount of meta-information can make Gaussian processes infeasible, as they require the inversion of a large covariance matrix which grows with the number of instances. Consequently, instead of learning a joint surrogate model on all of the meta-data, we propose to learn individual surrogate models on the observations of each data set and then combine all surrogates to a joint one using ensembling techniques. The final surrogate is a weighted sum of all data set specific surrogates plus an additional surrogate that is solely learned on the target observations. Within our framework, any surrogate model can be used and explore Gaussian processes in this scenario. We present two different strategies for finding the weights used in the ensemble: the first is based on a probabilistic product of experts approach, and the second is based on kernel regression. Additionally, we extend the framework to directly estimate the acquisition function in the same setting, using a novel technique which we name the “transfer acquisition function”. In an empirical evaluation including comparisons to the current state-of-the-art on two publicly available meta-data sets, we are able to demonstrate that our proposed approach does not only scale to large meta-data, but also finds the stronger prediction models.

Hyperparameter Transfer Learning through Surrogate Alignment for Efficient Deep Neural Network Training

Short-term Traffic Prediction with Deep Neural Networks and Adaptive Transfer Learning

Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates.

Transferable Neural Processes for Hyperparameter Optimization

Scalable Gaussian process-based transfer surrogates for hyperparameter optimization

Adaptively Transferring Deep Neural Networks with a Hybrid Evolution Strategy

Efficient Hyperparameter Optimization for Deep Learning Algorithms Using Deterministic RBF Surrogates

Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates

TransBO: Hyperparameter Optimization Via Two-Phase Transfer Learning

Efficient Bayesian Optimization with Deep Kernel Learning and Transformer Pre-trained on Multiple Heterogeneous Datasets

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Features are fate: a theory of transfer learning in high-dimensional regression

Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

A Surrogate-Assisted Highly Cooperative Coevolutionary Algorithm for Hyperparameter Optimization in Deep Convolutional Neural Networks

Learning Transferable Parameters for Unsupervised Domain Adaptation

Probabilistic Transfer Learning Through Ensemble Probabilistic Deep Neural Network

Deep Networks as Approximators of Optimal Transfers Solutions in Multitarget Missions

Learning What and Where to Transfer

Effective Transfer Learning Algorithm in Spiking Neural Networks

Offline-to-online hyperparameter transfer for stochastic bandits

Transfer Learning Based Search Space Design for Hyperparameter Tuning.