Abstract:In the classical best arm identification (Best-$1$-Arm) problem, we are given $n$ stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least $1-\delta$, using as few samples as possible. Understanding the sample complexity of Best-$1$-Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best-$1$-Arm. Given an instance $I$, let $\mu_{[i]}$ be the $i$th largest mean and $\Delta_{[i]}=\mu_{[1]}-\mu_{[i]}$ be the corresponding gap. $H(I)=\sum_{i=2}^n\Delta_{[i]}^{-2}$ is the complexity of the instance. The gap-entropy conjecture states that $\Omega\left(H(I)\cdot\left(\ln\delta^{-1}+\mathsf{Ent}(I)\right)\right)$ is an instance lower bound, where $\mathsf{Ent}(I)$ is an entropy-like term determined by the gaps, and there is a $\delta$-correct algorithm for Best-$1$-Arm with sample complexity $O\left(H(I)\cdot\left(\ln\delta^{-1}+\mathsf{Ent}(I)\right)+\Delta_{[2]}^{-2}\ln\ln\Delta_{[2]}^{-1}\right)$. If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-$1$-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O\left(H(I)\cdot\left(\ln\delta^{-1} +\mathsf{Ent}(I)\right)+\Delta_{[2]}^{-2}\ln\ln\Delta_{[2]}^{-1}\mathrm{polylog}(n,\delta^{-1})\right)\] samples in expectation. For the lower bound, we show that for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $\delta$-correct monotone algorithm requires $\Omega\left(H(I)\cdot\left(\ln\delta^{-1} + \mathsf{Ent}(I)\right)\right)$ samples in expectation.

Best Arm Identification in Linear Bandits with Linear Dimension Dependency.

Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits

Fixed-Budget Best-Arm Identification in Sparse Linear Bandits

Best Arm Identification in Bandits with Limited Precision Sampling

Pure Exploration in Bandits with Linear Constraints

Best Arm Identification in Spectral Bandits

Multi-armed linear bandits with latent biases

Best Arm Identification in Batched Multi-armed Bandit Problems

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Robust Best-arm Identification in Linear Bandits

Constrained Best Arm Identification in Grouped Bandits

Practical Algorithms for Best-K Identification in Multi-Armed Bandits.

Towards Instance Optimal Bounds for Best Arm Identification

Optimal Best Arm Identification with Fixed Confidence in Restless Bandits

Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits

Best Arm Identification with Minimal Regret

Best Arm Identification in Stochastic Bandits: Beyond $β-$optimality

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Adaptive Multiple-Arm Identification

Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits