16.2 FDR control with e-values (e-BH)

Alexandra Chouldechova, E Candes
Abstract:One example of using e-values is to test multiple hypotheses in multi-armed bandit problems with K arms, where the null hypothesis k is that arm k has a mean reward at most 1. In such problems, at time t≥ 1, one would pull arm kt and obtain an iid reward Xkt, t≥ 0, and the aim is to quickly detect arms with mean> 1, to maximize profit, or to minimize regret. There is usually a complicated dependence structure due to exploration/exploitation, so classical ways of dealing with the tests are non-trivial. However, it is easy to construct e-values by considering the running reward Mk, t=∏ t j= 1 Xk, j {kj= k}, and M1, τ,..., MK, τ would be e-values for any stopping time τ.
What problem does this paper attempt to address?