Abstract:We consider the combinatorial multi-armed bandit (CMAB) problem, where the reward function is nonlinear. In this setting, the agent chooses a batch of arms on each round and receives feedback from each arm of the batch. The reward that the agent aims to maximize is a function of the selected arms and their expectations. In many applications, the reward function is highly nonlinear, and the performance of existing algorithms relies on a global Lipschitz constant to encapsulate the function's nonlinearity. This may lead to loose regret bounds, since by itself, a large gradient does not necessarily cause a large regret, but only in regions where the uncertainty in the reward's parameters is high. To overcome this problem, we introduce a new smoothness criterion, which we term \emph{Gini-weighted smoothness}, that takes into account both the nonlinearity of the reward and concentration properties of the arms. We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter. This, in turn, leads to much tighter regret bounds when the smoothness parameter is batch-size independent. For example, in the probabilistic maximum coverage (PMC) problem, that has many applications, including influence maximization, diverse recommendations and more, we achieve dramatic improvements in the upper bounds. We also prove matching lower bounds for the PMC problem and show that our algorithm is tight, up to a logarithmic factor in the problem's parameters.

What problem does this paper attempt to address?

This paper attempts to address the challenges posed by non - linear reward functions in the Combinatorial Multi - Armed Bandit (CMAB) problem. Specifically, the paper mainly focuses on the following points: 1. **Limitations of existing algorithms**: Existing CMAB algorithms usually rely on the global Lipschitz constant to measure the non - linearity of the reward function. However, this approach may lead to loose regret bounds, because large gradients do not necessarily imply large regret, especially in regions with high parameter uncertainty. 2. **Introduction of a new smoothness criterion**: To overcome this problem, the authors introduce a new smoothness criterion, called Gini - weighted smoothness. This criterion takes into account not only the non - linearity of the reward function but also the concentration properties of the arms. 3. **Improvement of regret bounds**: By using Gini - weighted smoothness, the authors propose an Upper Confidence Bound (UCB) strategy based on the Empirical Bernstein inequality. This strategy can largely eliminate the linear dependence of the regret bound on the batch size, thereby obtaining tighter regret bounds. 4. **Application examples**: In particular, in the Probabilistic Maximum Coverage (PMC) problem, the authors show that the new method can significantly improve the upper bound of the regret bound. The PMC problem is of great significance in many practical applications, such as influence maximization, diversified recommendation, etc. ### Formula summary - **Gini - weighted smoothness condition**: \[ \sum_{i = 1}^L x_i(1 - x_i)\left(\frac{\partial f(A; x)}{\partial x_i}\right)^2\leq\gamma_g \] where \(\gamma_g\) is the smoothness parameter. - **UCB index**: \[ q_{ij}(t)=\min\left\{1,\hat{p}_{ij}(t - 1)+\sqrt{\frac{6\hat{V}_{ij}(t - 1)\log t}{N_j(t - 1)}}+\frac{9\log t}{N_j(t - 1)}\right\} \] - **Regret bound theorem**: \[ R(T)\leq\left[8640\gamma_g^2\bar{M}^2L\sum_{j = 1}^L\frac{1}{\Delta_{j,\min}}+340\gamma_\infty\bar{M}L\sum_{j = 1}^L\left(1+\log\frac{\Delta_{j,\max}}{\Delta_{j,\min}}\right)\right]\left\lceil\log\frac{K}{1.61}\right\rceil^2\log T+L\Delta_{\max}\left(1+\frac{M^2\pi^2}{3}\right) \] Through these improvements, the paper provides a more effective method for dealing with non - linear reward functions and demonstrates its superiority in multiple practical problems.

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

Combinatorial Multi-Armed Bandit: General Framework and Applications.

Combinatorial Multi-Armed Bandit with General Reward Functions

Multiarmed Bandits Problem Under the Mean-Variance Setting

Tight Bounds for Bandit Combinatorial Optimization

Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms

Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

UCB algorithms for multi-armed bandits: Precise regret and adaptive inference

Swimming in curved space or The Baron and the cat

Combinatorial Logistic Bandits

Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures

Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion

Finite Budget Analysis of Multi-Armed Bandit Problems.

Nash Regret Guarantees for Linear Bandits

Batched Dueling Bandits

Batched Lipschitz Bandits.

Batched Nonparametric Contextual Bandits