Sequential Large Language Model-Based Hyper-Parameter Optimization

Kanan Mahammadli
2024-10-27
Abstract:This study introduces SLLMBO, an innovative framework that leverages Large Language Models (LLMs) for hyperparameter optimization (HPO), incorporating dynamic search space adaptability, enhanced parameter landscape exploitation, and a hybrid, novel LLM-Tree-structured Parzen Estimator (LLM-TPE) sampler. By addressing limitations in recent fully LLM-based methods and traditional Bayesian Optimization (BO), SLLMBO achieves more robust optimization. This comprehensive benchmarking evaluates multiple LLMs, including GPT-3.5-turbo, GPT-4o, Claude-Sonnet-3.5, and Gemini-1.5-flash, extending prior work beyond GPT-3.5 and GPT-4 and establishing SLLMBO as the first framework to benchmark a diverse set of LLMs for HPO. By integrating LLMs' established strengths in parameter initialization with the exploitation abilities demonstrated in this study, alongside TPE's exploration capabilities, the LLM-TPE sampler achieves a balanced exploration-exploitation trade-off, reduces API costs, and mitigates premature early stoppings for more effective parameter searches. Across 14 tabular tasks in classification and regression, the LLM-TPE sampler outperformed fully LLM-based methods and achieved superior results over BO methods in 9 tasks. Testing early stopping in budget-constrained scenarios further demonstrated competitive performance, indicating that LLM-based methods generally benefit from extended iterations for optimal results. This work lays the foundation for future research exploring open-source LLMs, reproducibility of LLM results in HPO, and benchmarking SLLMBO on complex datasets, such as image classification, segmentation, and machine translation.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is the shortcomings of existing hyperparameter optimization (HPO) methods in terms of automation, search space adaptability, and exploration-exploitation balance. Specifically: 1. **Limitations of Traditional Bayesian Optimization (BO)**: - Requires human experts to define the parameters to be optimized and their possible value ranges. - The search space remains fixed throughout the optimization process and cannot be dynamically adjusted. - The optimization process usually starts with random initial parameters, which is inefficient and computationally expensive. - For each new task, BO needs to optimize from scratch. 2. **Limitations of Current Large Language Model (LLM)-Based Methods**: - Mainly use OpenAI's models without extensive evaluation of other LLMs' performance. - Limited by input token constraints, allowing only a limited number of iterations. - Lack of a dynamic search space adaptation mechanism, which may lead to premature convergence or missing the optimal solution. - Lack of autonomous management capabilities in terms of exploration-exploitation balance. To address these issues, the paper proposes an innovative framework—**SLLMBO** (Sequential Large Language Model-Based Optimization), which combines the advantages of LLMs and the Tree-structured Parzen Estimator (TPE) sampling method to achieve dynamic search space adaptation, enhanced parameter landscape utilization, and balanced exploration-exploitation strategies. With these improvements, SLLMBO aims to achieve more robust hyperparameter optimization and perform excellently in multiple benchmark tests.