Sequential Optimum Test with Multi-armed Bandits for Online Experimentation
Fang Kong,Penglei Zhao,Shichao Han,Yong Wang,Shuai Li
DOI: https://doi.org/10.1145/3627673.3680040
2024-01-01
Abstract:In large-scale online experimentation platforms, experimenters aim to discover the best treatment (arm) among multiple candidates. Traditional A/B testing and multi-armed bandits (MAB) algorithms are two popular designs. The former usually achieves a higher power but may hurt the customers' satisfaction when always recommending a poor arm, while the latter aims at improving the customers' experience (collecting more rewards) but faces the loss of testing power. Recently, [26] combine the advantage of A/B testing and MAB algorithms to maximize the testing power while maintaining more rewards for experiments with two-arm and Bernoulli rewards. However, in practice, the number of arms is usually larger than two and the reward type also varies. In multi-arm experiments, the required sample size to find the optimal arm blows up to guarantee a false discovery rate with the increase of arm numbers, bringing high opportunity costs to experimenters. To save the cost during the long experimental process, we propose a more efficient sequential test framework named Soptima that can work with general reward types. Inspired by the design of traditional MAB algorithms in chasing rewards and A/B testing in maximizing power, we propose an Elimination-type strategy adapted to this framework to dynamically adjust the traffic split on arms. This strategy cooperating with Soptima simultaneously maintains the advantage of the A/B testing in maximizing the testing power, the sequential test methods in saving the sample size, and the MAB algorithms in collecting rewards. The theoretical analysis gives guarantees on the Type-I, Type-II, and optimality error rates of the proposed approach. A series of experiments from both simulation and industrial historical data sets are conducted to verify the superiority of our approach compared with available baselines.