Best Action Selection In A Stochastic Environment

Yingce Xia,Tao Qin,Nenghai Yu,Tie-Yan Liu
DOI: https://doi.org/10.5555/2936924.2937036
2016-01-01
Abstract:We study the problem of selecting the best action from multiple candidates in a stochastic environment. In such a stochastic setting, when taking an action, a player receives a random reward and affords a random cost, which are drawn from two unknown distributions. We target at selecting the best action, the one with the maximum ratio of the expected reward to the expected cost, after exploring the actions for n rounds. In particular, we study three mechanisms: (i) the uniform exploration mechanism MU; (ii) the successive elimination mechanism MSE; and (iii) the ratio confidence bound exploration mechanism MRCB. We prove that for all the three mechanisms, the probabilities that the best action is not selected (i.e., the error probabilities) can be upper bounded by O (exp f), where c is a constant related to the mechanisms and coe ffi cients about the actions. We then give an asymptotic lower bound of the error probabilities of the consistent mechanisms for Bernoulli setting, and discuss its relationship with the upper bounds in di ff erent aspects. Our proposed mechanisms can be degenerated to cover the cases where only the reward/ costs are random. We also test the proposed mechanisms through numerical experiments.
What problem does this paper attempt to address?