Online non-monotone diminishing return submodular maximization in the bandit setting

Jiachen Ju,Xiao Wang,Dachuan Xu
DOI: https://doi.org/10.1007/s10898-024-01413-0
2024-06-14
Journal of Global Optimization
Abstract:In this paper, we study online diminishing return submodular (DR-submodular for short) maximization in the bandit setting. Our focus is on problems where the reward functions can be non-monotone, and the constraint set is a general convex set. We first present the Single-sampling Non-monotone Frank-Wolfe algorithm. This algorithm only requires a single call to each reward function, and it computes the stochastic gradient to make it suitable for large-scale settings where full gradient information might not be available. We provide an analysis of the approximation ratio and regret bound of the proposed algorithm. We then propose the Bandit Online Non-monotone Frank-Wolfe algorithm to adjust for problems in the bandit setting, where each reward function returns the function value at a single point. We take advantage of smoothing approximations to reward functions to tackle the challenges posed by the bandit setting. Under mild assumptions, our proposed algorithm can reach -approximation with regret bounded by , where the positive parameter is related to the "safety domain" . To the best of our knowledge, this is the first work to address online non-monotone DR-submodular maximization over a general convex set in the bandit setting.
mathematics, applied,operations research & management science
What problem does this paper attempt to address?