High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling

Yuxuan Yin,Yu Wang,Peng Li
2024-02-04
Abstract:We introduce a novel semi-supervised learning approach, named Teacher-Student Bayesian Optimization ($\texttt{TSBO}$), integrating the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time. $\texttt{TSBO}$ incorporates a teacher model, an unlabeled data sampler, and a student model. The student is trained on unlabeled data locations generated by the sampler, with pseudo labels predicted by the teacher. The interplay between these three components implements a unique selective regularization to the teacher in the form of student feedback. This scheme enables the teacher to predict high-quality pseudo labels, enhancing the generalization of the GP surrogate model in the search space. To fully exploit $\texttt{TSBO}$, we propose two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization. Furthermore, we quantify and leverage the uncertainty of the teacher-student model for the provision of reliable feedback to the teacher in the presence of risky pseudo-label predictions. $\texttt{TSBO}$ demonstrates significantly improved sample-efficiency in several global optimization tasks under tight labeled data budgets.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of improving data query efficiency in Bayesian Optimization (BO) in high-dimensional spaces. Specifically, the paper proposes a new semi-supervised learning method called Teacher-Student Bayesian Optimization (TSBO), which integrates the teacher-student paradigm into the Bayesian Optimization framework to minimize the demand for expensive labeled data queries. TSBO achieves a selective regularization mechanism by introducing a teacher model, an unlabeled data sampler, and a student model, leveraging a large amount of potentially high-quality unlabeled data to enhance sample efficiency in high-dimensional optimization tasks. Additionally, TSBO proposes two optimized unlabeled data sampling methods and improves feedback quality by introducing an uncertainty-aware mechanism, thereby further enhancing model performance. Experimental results show that TSBO significantly improves sample efficiency in multiple high-dimensional global optimization tasks, particularly excelling when the labeled data budget is tight.