RLBOF: Reinforcement Learning from Bayesian Optimization Feedback

Hailong Huang,Xiubo Liang,Quanwei Zhang,Hongzhi Wang,Xiangdong Li
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650719
2024-01-01
Abstract:Bayesian Optimization is a powerful technique employed to address black-box optimization problems, finding applications in various domains. Meta-Bayesian Optimization (Meta-BO) is a specific approach designed to improve data efficiency by leveraging information from related tasks. In recent years, there has been notable progress in the field of Meta-BO, particularly in surrogate models and acquisition functions that utilize data from related tasks. However, these advancements have predominantly focused on singular aspects of Bayesian optimization, leaving untapped potential in the integration of these two aspects.We propose a novel approach that enables the surrogate model to effectively integrate the acquisition function for Bayesian optimization tasks. This makes the surrogate model to transcend mere function approximation, effectively addressing the aforementioned problem. Taking inspiration from large language models that receive feedback in actual human dialogue tasks, our approach involves pre-training a neural process surrogate model and subsequently leveraging feedback obtained from real Bayesian optimization scenarios to enhance its Bayesian optimization capability. To achieve this, we have extended the Proximal Policy Optimization to utilize feedback derived from Bayesian optimization, incentivizing the pre-trained surrogate model. Our approach has undergone thorough evaluation across diverse models and various benchmark functions. Remarkably, even with minimal incentives, the models exhibit significant advancements in Bayesian optimization, highlighting the effectiveness and robust generalization ability of our proposed method.
What problem does this paper attempt to address?