Abstract:We introduce a novel framework called combinatorial logistic bandits (CLogB), where in each round, a subset of base arms (called the super arm) is selected, with the outcome of each base arm being binary and its expectation following a logistic parametric model. The feedback is governed by a general arm triggering process. Our study covers CLogB with reward functions satisfying two smoothness conditions, capturing application scenarios such as online content delivery, online learning to rank, and dynamic channel allocation. We first propose a simple yet efficient algorithm, CLogUCB, utilizing a variance-agnostic exploration bonus. Under the 1-norm triggering probability modulated (TPM) smoothness condition, CLogUCB achieves a regret bound of $\tilde{O}(d\sqrt{\kappa KT})$, where $\tilde{O}$ ignores logarithmic factors, $d$ is the dimension of the feature vector, $\kappa$ represents the nonlinearity of the logistic model, and $K$ is the maximum number of base arms a super arm can trigger. This result improves on prior work by a factor of $\tilde{O}(\sqrt{\kappa})$. We then enhance CLogUCB with a variance-adaptive version, VA-CLogUCB, which attains a regret bound of $\tilde{O}(d\sqrt{KT})$ under the same 1-norm TPM condition, improving another $\tilde{O}(\sqrt{\kappa})$ factor. VA-CLogUCB shows even greater promise under the stronger triggering probability and variance modulated (TPVM) condition, achieving a leading $\tilde{O}(d\sqrt{T})$ regret, thus removing the additional dependency on the action-size $K$. Furthermore, we enhance the computational efficiency of VA-CLogUCB by eliminating the nonconvex optimization process when the context feature map is time-invariant while maintaining the tight $\tilde{O}(d\sqrt{T})$ regret. Finally, experiments on synthetic and real-world datasets demonstrate the superior performance of our algorithms compared to benchmark algorithms.

Multi-armed Bandits with Compensation

Combinatorial Multi-Armed Bandit with General Reward Functions

Combinatorial Multi-Armed Bandit: General Framework and Applications.

Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms

Multi-Armed Bandit with Budget Constraint and Variable Costs.

Multiarmed Bandits Problem Under the Mean-Variance Setting

Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

Adaptive Multiple-Arm Identification

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

Achieve Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits

Auction-Based Combinatorial Multi-Armed Bandit Mechanisms with Strategic Arms

Combinatorial Logistic Bandits

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Swimming in curved space or The Baron and the cat

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Distributed Bandits with Heterogeneous Agents

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Towards Distribution-Free Multi-Armed Bandits with Combinatorial Strategies

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications