Yucen Lily Li,Tim G. J. Rudner,Andrew Gordon Wilson
Abstract:Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs, linearized Laplace approximations, and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) deep ensembles perform relatively poorly; (v) infinite-width BNNs are particularly promising, especially in high dimensions.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper mainly explores the selection of surrogate models in Bayesian Optimization (BO). Specifically, the authors study the application of Bayesian Neural Networks (BNNs) as surrogate models in Bayesian Optimization and compare them with traditional Gaussian Processes (GPs).
#### Main problems:
1. **Surrogate model selection**: Although Gaussian Processes (GPs) are the most commonly used surrogate models in Bayesian Optimization, they have some limitations, such as being unable to handle non - stationarity and high - dimensional data naturally. Therefore, the authors propose whether other surrogate models, especially Bayesian Neural Networks (BNNs), should be considered to improve the performance of Bayesian Optimization.
2. **Applicability of BNNs**: In recent years, significant progress has been made in the research of Bayesian Neural Networks, which makes BNNs a potential surrogate model. However, current research on the performance of BNNs in Bayesian Optimization and their advantages and disadvantages relative to GPs is very limited. Therefore, the authors hope to understand the performance of BNNs under different conditions through systematic experimental evaluation.
3. **Impact of approximate inference methods**: The training of Bayesian Neural Networks usually requires the use of approximate inference methods, such as Hamiltonian Monte Carlo (HMC), Stochastic Gradient Hamiltonian Monte Carlo (SGHMC), and Deep Ensembles. How different approximate inference methods affect the performance of Bayesian Optimization is also an important research direction in this paper.
4. **Handling of multi - dimensional and complex objective functions**: Bayesian Optimization is often applied to high - dimensional input spaces and multi - objective optimization problems. The authors hope to evaluate the potential of BNNs in practical application scenarios by studying their performance on these problems.
#### Research content:
- **Model types**: Including Bayesian Neural Networks with finite width, Bayesian Neural Networks with infinite width, Linearized Laplace Approximation (LLA), Deep Kernel Learning (DKL), etc.
- **Approximate inference methods**: Such as HMC, SGHMC, Deep Ensembles, etc.
- **Experimental settings**: Covering synthetic benchmark tests, real - world applications, non - stationary objective functions, etc.
Through these studies, the authors hope to provide a more comprehensive surrogate model selection framework for Bayesian Optimization and reveal the advantages and limitations of BNNs under different conditions.
### Conclusion:
The main conclusions of the paper include:
1. Different approximate inference methods have a great impact on the performance of BNNs, among which HMC usually performs the best.
2. In some cases, Deep Kernel Learning (DKL) can compete with GPs, especially in high - dimensional input spaces.
3. The performance of Deep Ensembles is relatively poor and may not be suitable for Bayesian Optimization.
4. Bayesian Neural Networks with infinite width perform excellently in high - dimensional optimization problems, especially when dealing with non - Euclidean similarity metrics.
In general, this paper provides a new perspective for Bayesian Optimization, indicating that BNNs may become an effective alternative to traditional GPs, especially when dealing with complex and high - dimensional problems.