Abstract:Classic contextual combinatorial multi-armed bandit problems aim to maximize the expected cumulative joint reward in the long run, where a learner plays a set of arms (i.e., a super arm) with time-invariant linear rewards of context features in each round. However, in many real-world applications, linear-reward assumptions often fail to be satisfied and the environment is in general non-stationary, leading to low performance with the bandit models above. Existing works fail to deal with non-linear rewards in the non-stationary environment and the algorithmic challenge remains. In this paper, we initiate the study of a non-stationary neural contextual combinatorial bandit problem, where the reward function of each individual arm can be estimated by a deep neural network based on boundedness assumption and a time-variant reward mapping function. Furthermore, we design an algorithm NNCMAB, which dynamically partitions the context subspace into multiple subspaces and fits reward mapping functions for each subspace by neural networks such that only the models of related subspaces are re-trained when local environment changes happen. NNCMAB can provably achieve $\tilde{O}\left(T^{\frac{3}{4}}+\sqrt{T}N_{c}\right)$ regret, where T is the number of rounds, and $N_{c}$ is a parameter associated with the distribution change. Evaluation results under synthetic and real-world LastFM datasets show that NNCMAB significantly outperforms other state-of-the-art with both linear and non-linear individual rewards under non-stationary environments.

Contextual Bandit Applications in Customer Support Bot

Conversational Contextual Bandit: Algorithm and Application

Deep Contextual Multi-armed Bandits

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

Selectively Contextual Bandits

Learning Contextual Bandits in a Non-stationary Environment

LLMs-augmented Contextual Bandit

Neural Contextual Bandits for Personalized Recommendation

A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing

AutoML for Contextual Bandits

Partially Observable Contextual Bandits with Linear Payoffs

Context-Aware Bandits

Neural Contextual Combinatorial Bandit under Non-stationary Environment

Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits

Towards Domain Adaptive Neural Contextual Bandits

Latent Contextual Bandits and their Application to Personalized Recommendations for New Users

Contextual Bandit Approach-based Recommendation System for Personalized Web-based Services

Risk-Aware Continuous Control with Neural Contextual Bandits

contextual: Evaluating Contextual Multi-Armed Bandit Problems in R

Simple Regret Minimization for Contextual Bandits

Convolutional Neural Bandit for Visual-aware Recommendation