Abstract:We introduce a novel framework called combinatorial logistic bandits (CLogB), where in each round, a subset of base arms (called the super arm) is selected, with the outcome of each base arm being binary and its expectation following a logistic parametric model. The feedback is governed by a general arm triggering process. Our study covers CLogB with reward functions satisfying two smoothness conditions, capturing application scenarios such as online content delivery, online learning to rank, and dynamic channel allocation. We first propose a simple yet efficient algorithm, CLogUCB, utilizing a variance-agnostic exploration bonus. Under the 1-norm triggering probability modulated (TPM) smoothness condition, CLogUCB achieves a regret bound of $\tilde{O}(d\sqrt{\kappa KT})$, where $\tilde{O}$ ignores logarithmic factors, $d$ is the dimension of the feature vector, $\kappa$ represents the nonlinearity of the logistic model, and $K$ is the maximum number of base arms a super arm can trigger. This result improves on prior work by a factor of $\tilde{O}(\sqrt{\kappa})$. We then enhance CLogUCB with a variance-adaptive version, VA-CLogUCB, which attains a regret bound of $\tilde{O}(d\sqrt{KT})$ under the same 1-norm TPM condition, improving another $\tilde{O}(\sqrt{\kappa})$ factor. VA-CLogUCB shows even greater promise under the stronger triggering probability and variance modulated (TPVM) condition, achieving a leading $\tilde{O}(d\sqrt{T})$ regret, thus removing the additional dependency on the action-size $K$. Furthermore, we enhance the computational efficiency of VA-CLogUCB by eliminating the nonconvex optimization process when the context feature map is time-invariant while maintaining the tight $\tilde{O}(d\sqrt{T})$ regret. Finally, experiments on synthetic and real-world datasets demonstrate the superior performance of our algorithms compared to benchmark algorithms.

Comparator-adaptive Convex Bandits

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

Risk-Averse Stochastic Convex Bandit

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

Adversarial Combinatorial Bandits with Switching Costs

Bandit Convex Optimization in Non-stationary Environments.

Convex Methods for Constrained Linear Bandits

Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits

Combinatorial Bandits with Relative Feedback

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Second Order Methods for Bandit Optimization and Control

Adaptive Regret of Convex and Smooth Functions

Online Newton Method for Bandit Convex Optimisation

Projection-Free Bandit Convex Optimization over Strongly Convex Sets

Compliance-Aware Bandits

Federated Online and Bandit Convex Optimization

Simple Combinatorial Algorithms for Combinatorial Bandits - Corruptions and Approximations.

Bandit Convex Optimisation

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

Combinatorial Logistic Bandits

Regret Analysis for Continuous Dueling Bandit