Abstract:We introduce a novel framework called combinatorial logistic bandits (CLogB), where in each round, a subset of base arms (called the super arm) is selected, with the outcome of each base arm being binary and its expectation following a logistic parametric model. The feedback is governed by a general arm triggering process. Our study covers CLogB with reward functions satisfying two smoothness conditions, capturing application scenarios such as online content delivery, online learning to rank, and dynamic channel allocation. We first propose a simple yet efficient algorithm, CLogUCB, utilizing a variance-agnostic exploration bonus. Under the 1-norm triggering probability modulated (TPM) smoothness condition, CLogUCB achieves a regret bound of $\tilde{O}(d\sqrt{\kappa KT})$, where $\tilde{O}$ ignores logarithmic factors, $d$ is the dimension of the feature vector, $\kappa$ represents the nonlinearity of the logistic model, and $K$ is the maximum number of base arms a super arm can trigger. This result improves on prior work by a factor of $\tilde{O}(\sqrt{\kappa})$. We then enhance CLogUCB with a variance-adaptive version, VA-CLogUCB, which attains a regret bound of $\tilde{O}(d\sqrt{KT})$ under the same 1-norm TPM condition, improving another $\tilde{O}(\sqrt{\kappa})$ factor. VA-CLogUCB shows even greater promise under the stronger triggering probability and variance modulated (TPVM) condition, achieving a leading $\tilde{O}(d\sqrt{T})$ regret, thus removing the additional dependency on the action-size $K$. Furthermore, we enhance the computational efficiency of VA-CLogUCB by eliminating the nonconvex optimization process when the context feature map is time-invariant while maintaining the tight $\tilde{O}(d\sqrt{T})$ regret. Finally, experiments on synthetic and real-world datasets demonstrate the superior performance of our algorithms compared to benchmark algorithms.

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control

A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

A Simple Approach For Non-Stationary Linear Bandits

Non-Stationary Latent Auto-Regressive Bandits

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

Non-stationary Bandits with Habituation and Recovery Dynamics and Knapsack Constraints

UCB algorithms for multi-armed bandits: Precise regret and adaptive inference

Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes

Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits

Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards

Multiscale Non-stationary Stochastic Bandits

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

Combinatorial Stochastic-Greedy Bandit

Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

Stochastic Graphical Bandits with Heavy-Tailed Rewards.

Thompson Sampling in Switching Environments with Bayesian Online Change Point Detection

Combinatorial Logistic Bandits

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits