Using Large Language Models for Hyperparameter Optimization

Michael R. Zhang,Nishkrit Desai,Juhan Bae,Jonathan Lorraine,Jimmy Ba
2023-12-08
Abstract:This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO). Empirical evaluations demonstrate that in settings with constrained search budgets, LLMs can perform comparably or better than traditional HPO methods like random search and Bayesian optimization on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs, going beyond the capabilities of existing HPO approaches. Our findings suggest that LLMs are a promising tool for improving efficiency in the traditional decision-making problem of hyperparameter optimization.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily explores how to utilize large language models (LLMs) for hyperparameter optimization (HPO). Specifically, the paper attempts to address the following key issues: 1. **Using LLMs for Hyperparameter Optimization**: The study investigates how to prompt LLMs to recommend hyperparameter settings and evaluate the effectiveness of these settings given a search budget. 2. **Improving the Efficiency of Traditional HPO Methods**: The paper points out that traditional HPO methods (such as random search, Bayesian optimization, etc.) have some limitations, such as reliance on manually designed search spaces and poor performance in the initial search phase. Therefore, the researchers explore whether LLMs can serve as a more efficient tool to improve these issues. 3. **Extending the Capabilities of HPO**: In addition to traditional hyperparameter configurations, the study proposes a more flexible approach—allowing LLMs to generate training code (e.g., code written in PyTorch), thereby automatically adjusting the model structure and other related parameters. 4. **Evaluating Performance in Different Scenarios**: The paper not only evaluates the performance of LLMs on standard benchmark datasets but also tests their performance in low-dimensional optimization problems and code generation tasks to verify their generality and flexibility. In summary, the core objective of this paper is to explore the potential of LLMs as a tool for hyperparameter optimization and to evaluate their effectiveness in different application scenarios, particularly whether they can surpass or match traditional methods under limited search budgets. Additionally, the study discusses how LLMs can further extend to more flexible hyperparameter configuration methods, such as automatically generating training code.