Best Arm Identification in Linear Bandits with Linear Dimension Dependency.

Chao Tao,Saul A. Blanco,Yuan Zhou
2018-01-01
Abstract:We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown d-dimensional parameter vector theta, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized theta estimator based on the solution to the convex relaxation of an optimal G-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension d, as well as an algorithm with sample complexity dependent on the reward gaps of the best d arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.
What problem does this paper attempt to address?