Abstract:We study the Linear Contextual Bandit problem in the hybrid reward setting. In this setting every arm's reward model contains arm specific parameters in addition to parameters shared across the reward models of all the arms. We can reduce this setting to two closely related settings (a) Shared - no arm specific parameters, and (b) Disjoint - only arm specific parameters, enabling the application of two popular state of the art algorithms - $\texttt{LinUCB}$ and $\texttt{DisLinUCB}$ (Algorithm 1 in (Li et al. 2010)). When the arm features are stochastic and satisfy a popular diversity condition, we provide new regret analyses for both algorithms, significantly improving on the known regret guarantees of these algorithms. Our novel analysis critically exploits the hybrid reward structure and the diversity condition. Moreover, we introduce a new algorithm $\texttt{HyLinUCB}$ that crucially modifies $\texttt{LinUCB}$ (using a new exploration coefficient) to account for sparsity in the hybrid setting. Under the same diversity assumptions, we prove that $\texttt{HyLinUCB}$ also incurs only $O(\sqrt{T})$ regret for $T$ rounds. We perform extensive experiments on synthetic and real-world datasets demonstrating strong empirical performance of $\texttt{HyLinUCB}$.For number of arm specific parameters much larger than the number of shared parameters, we observe that $\texttt{DisLinUCB}$ incurs the lowest regret. In this case, regret of $\texttt{HyLinUCB}$ is the second best and extremely competitive to $\texttt{DisLinUCB}$. In all other situations, including our real-world dataset, $\texttt{HyLinUCB}$ has significantly lower regret than $\texttt{LinUCB}$, $\texttt{DisLinUCB}$ and other SOTA baselines we considered. We also empirically observe that the regret of $\texttt{HyLinUCB}$ grows much slower with the number of arms compared to baselines, making it suitable even for very large action spaces.

Networked Bandits With Disjoint Linear Payoffs

A Gang of Bandits

Distributed Bandits with Heterogeneous Agents

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Doubly Adversarial Federated Bandits

Random Walk Bandits.

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

Multi-dueling Bandits with Dependent Arms

Neural Dueling Bandits

Portfolio Choices with Orthogonal Bandit Learning

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

Networked Restless Bandits with Positive Externalities

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Multi-Armed Bandits with Network Interference

Learning Contextual Bandits in a Non-stationary Environment

Contextual Bandits with Arm Request Costs and Delays

Linear Contextual Bandits with Hybrid Payoff: Revisited

Partially Observable Contextual Bandits with Linear Payoffs

Bayesian Incentive-Compatible Bandit Exploration

Contextual Bandits with Random Projection