PFNs4BO: In-Context Learning for Bayesian Optimization

Samuel Müller,Matthias Feurer,Noah Hollmann,Frank Hutter
2023-07-22
Abstract:In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) through in-context learning on any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network (BNN). In addition, we show how to incorporate further information into the prior, such as allowing hints about the position of optima (user priors), ignoring irrelevant dimensions, and performing non-myopic BO by learning the acquisition function. The flexibility underlying these extensions opens up vast possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish code alongside trained models at <a class="link-external link-http" href="http://github.com/automl/PFNs4BO" rel="external noopener nofollow">this http URL</a>.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the efficiency and performance of Bayesian Optimization (BO) in Hyperparameter Optimization (HPO) tasks. Specifically, the paper proposes a new method - Prior - data Fitted Networks (PFNs) to replace the traditional Gaussian Processes (GPs) as the surrogate model in Bayesian optimization. PFNs approximate the Posterior Predictive Distribution (PPD) by training neural networks, enabling fast approximate Bayesian inference in a single forward pass. This method not only improves computational efficiency but also allows the use of any prior distribution that can be sampled efficiently, increasing the flexibility and applicability of the model. ### Main contributions: 1. **Flexibility**: PFNs can mimic different surrogate models, such as naive Gaussian processes, advanced Gaussian processes, and Bayesian Neural Networks (BNNs). In addition, extra information can be added to the prior, such as user hints about the optimal location, ignoring irrelevant dimensions, and performing non - myopic Bayesian optimization. 2. **Performance**: The paper verifies the effectiveness of PFNs on artificially generated Gaussian process samples and three different Hyperparameter Optimization test - beds (HPO - B, Bayesmark, and PD1) through large - scale experiments. The results show that PFNs perform excellently in multiple benchmarks and have competitive or even better performance compared to the traditional empirical - Bayes - based Gaussian processes. 3. **Scalability**: PFNs can further improve performance by combining gradient optimization techniques with input warping and acquisition function optimization. Moreover, the paper also shows how to use PFNs to approximate non - myopic acquisition functions, which is very useful for optimizing long - term strategies. ### Specific problems solved: - **Limitations of traditional Gaussian processes**: Traditional Gaussian processes have high computational complexity when dealing with large - scale data, and assume that the data conforms to a joint Gaussian distribution, which may lead to model mismatch when dealing with long - tailed data. In addition, using a fixed kernel function to represent the model prior also limits its flexibility. - **Challenges in hyperparameter optimization**: In machine learning algorithms, hyperparameter optimization is a crucial task, but traditional Bayesian optimization methods face difficulties when dealing with high - dimensional search spaces and non - stationary, heteroscedastic functions. - **Integration of user knowledge**: The paper proposes a method that allows users to provide prior knowledge about the optimal location during the optimization process, thereby improving the optimization performance. ### Conclusion: By introducing PFNs, the paper provides a flexible and efficient new method for Bayesian optimization, especially suitable for hyperparameter optimization tasks. PFNs not only have performance comparable to or better than traditional methods, but also show stronger adaptability in dealing with complex priors and user knowledge. These features make PFNs an important progress in the field of Bayesian optimization.