Abstract:We study the Linear Contextual Bandit problem in the hybrid reward setting. In this setting every arm's reward model contains arm specific parameters in addition to parameters shared across the reward models of all the arms. We can reduce this setting to two closely related settings (a) Shared - no arm specific parameters, and (b) Disjoint - only arm specific parameters, enabling the application of two popular state of the art algorithms - $\texttt{LinUCB}$ and $\texttt{DisLinUCB}$ (Algorithm 1 in (Li et al. 2010)). When the arm features are stochastic and satisfy a popular diversity condition, we provide new regret analyses for both algorithms, significantly improving on the known regret guarantees of these algorithms. Our novel analysis critically exploits the hybrid reward structure and the diversity condition. Moreover, we introduce a new algorithm $\texttt{HyLinUCB}$ that crucially modifies $\texttt{LinUCB}$ (using a new exploration coefficient) to account for sparsity in the hybrid setting. Under the same diversity assumptions, we prove that $\texttt{HyLinUCB}$ also incurs only $O(\sqrt{T})$ regret for $T$ rounds. We perform extensive experiments on synthetic and real-world datasets demonstrating strong empirical performance of $\texttt{HyLinUCB}$.For number of arm specific parameters much larger than the number of shared parameters, we observe that $\texttt{DisLinUCB}$ incurs the lowest regret. In this case, regret of $\texttt{HyLinUCB}$ is the second best and extremely competitive to $\texttt{DisLinUCB}$. In all other situations, including our real-world dataset, $\texttt{HyLinUCB}$ has significantly lower regret than $\texttt{LinUCB}$, $\texttt{DisLinUCB}$ and other SOTA baselines we considered. We also empirically observe that the regret of $\texttt{HyLinUCB}$ grows much slower with the number of arms compared to baselines, making it suitable even for very large action spaces.

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Combinatorial Multi-Armed Bandit: General Framework and Applications.

Bandits with Mean Bounds

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

UCB algorithms for multi-armed bandits: Precise regret and adaptive inference

Non-stationary Bandits with Habituation and Recovery Dynamics and Knapsack Constraints

Stochastic Conservative Contextual Linear Bandits

An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints

Online Algorithms for the Multi-Armed Bandit Problem with Markovian Rewards

Bandits with Concave Aggregated Reward

Congested Bandits: Optimal Routing via Short-term Resets

The Fragility of Optimized Bandit Algorithms

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates

Restless Linear Bandits

A One-Size-Fits-All Solution to Conservative Bandit Problems

Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments

Combinatorial Bandits under Strategic Manipulations

Linear Contextual Bandits with Hybrid Payoff: Revisited