Abstract:We define a general framework for a large class of combinatorial multi-armed bandit (CMAB) problems, where simple arms with unknown distributions form super arms . In each round, a super arm is played and the outcomes of its related simple arms are observed, which helps the selection of super arms in future rounds. The reward of the super arm depends on the outcomes of played arms, and it only needs to satisfy two mild assumptions, which allow a large class of nonlinear reward instances. We assume the availability of an (α, β)-approximation oracle that takes the means of the distributions of arms and outputs a super arm that with probability β generates an β fraction of the optimal expected reward. The objective of a CMAB algorithm is to minimize (α, β)- approximation regret , which is the difference in total expected reward between the αbeta; fraction of expected reward when always playing the optimal super arm, and the expected reward of playing super arms according to the algorithm. We provide CUCB algorithm that achieves O (log n ) regret, where n is the number of rounds played, and we further provide distribution-independent bounds for a large class of reward functions. Our regret analysis is tight in that it matches the bound for classical MAB problem up to a constant factor, and it significantly improves the regret bound in a recent paper on combinatorial bandits with linear rewards. We apply our CMAB framework to two new applications, probabilistic maximum coverage (PMC) for online advertising and social influence maximization for viral marketing, both having nonlinear reward structures.

Budgeted Bandit Problems with Continuous Random Costs.

Finite Budget Analysis of Multi-Armed Bandit Problems.

Budget-Constrained Bandits over General Cost and Reward Distributions

Multi-Armed Bandit with Budget Constraint and Variable Costs.

Non-stationary Bandits with Habituation and Recovery Dynamics and Knapsack Constraints

Linear Submodular Bandits With A Knapsack Constraint

Budgeted Multi-Armed Bandits with Multiple Plays.

Per-Round Knapsack-Constrained Linear Submodular Bandits

Thompson Sampling for Budgeted Multi-Armed Bandits

Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness

Contextual Bandits with Arm Request Costs and Delays

Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards

Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Adversarial Combinatorial Bandits with Switching Costs

Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits

Batched Lipschitz Bandits.

Bandits with Concave Aggregated Reward

Confounded Budgeted Causal Bandits

Combinatorial Multi-Armed Bandit: General Framework and Applications.

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits