Abstract:Contextual combinatorial cascading bandit ($C^{3}$ -bandit) is a powerful multi-armed bandit framework that balances the tradeoff between exploration and exploitation in the learning process. It well captures users’ click behavior and has been applied in a broad spectrum of real-world applications such as recommender systems and search engines. However, such a framework does not provide a performance guarantee of the initial exploration phase. To that end, we propose conservative contextual combinatorial cascading bandit ($C^{4}$ -bandit) model, aiming to address the aforementioned crucial modeling issues. In this problem, the learning agent is given some contexts and recommends a list of items not worse than the baseline strategy, and then observes the reward by some stopping rule. The objective is now to maximize the reward while simultaneously satisfying the safety constraint, i.e. guaranteeing the algorithm to perform at least as well as a baseline strategy. To tackle this new problem, we extend an online learning algorithm, called Upper Confidence Bound (UCB), to deal with a critical tradeoff between exploitation and exploration and employ the conservative mechanism to properly handle the safety constraints. By carefully integrating these two techniques, we develop a new algorithm, called $C^{4}$ -UCB for this problem. Further, we rigorously prove the n-step upper bound in two situations: known baseline reward and unknown baseline reward. The regret in both situations is only enlarged by an additive constant term compared to results of $C^{3}$ -bandit. Finally, experiments on synthetic and realistic datasets demonstrate its advantages.

Transferable Contextual Bandits with Prior Observations

Conversational Contextual Bandit: Algorithm and Application

Cold-start Problems in Recommendation Systems via Contextual-bandit Algorithms

Latent Contextual Bandits and their Application to Personalized Recommendations for New Users

Learning Contextual Bandits in a Non-stationary Environment

Context-Aware Bandits

Bandits Warm-up Cold Recommender Systems

Towards Domain Adaptive Neural Contextual Bandits

Con-CNAME: A Contextual Multi-armed Bandit Algorithm for Personalized Recommendations

Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts

Contextual Bandit Approach-based Recommendation System for Personalized Web-based Services

Achieving User-Side Fairness in Contextual Bandits

Neural Contextual Bandits for Personalized Recommendation

Jump Starting Bandits with LLM-Generated Prior Knowledge

A Contextual-Bandit Approach to Personalized News Article Recommendation

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

Fairness-aware Bandit-based Recommendation

Conservative Contextual Combinatorial Cascading Bandit

Deep Contextual Multi-armed Bandits

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

HELLINGER-UCB: A novel algorithm for stochastic multi-armed bandit problem and cold start problem in recommender system