Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests

Pallavi Basu,Ron Berman
2024-07-01
Abstract:A/B testers conducting large-scale tests prioritize lifts and want to be able to control false rejections of the null. This work develops a decision-theoretic framework for maximizing profits subject to false discovery rate (FDR) control. We build an empirical Bayes solution for the problem via the greedy knapsack approach. We derive an oracle rule based on ranking the ratio of expected lifts and the cost of wrong rejections using the local false discovery rate (lfdr) statistic. Our oracle decision rule is valid and optimal for large-scale tests. Further, we establish asymptotic validity for the data-driven procedure and demonstrate finite-sample validity in experimental studies. We also demonstrate the merit of the proposed method over other FDR control methods. Finally, we discuss an application to actual Optimizely experiments.
Methodology,Machine Learning,Applications
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to optimize lifts and control the False Discovery Rate (FDR) in large - scale A/B testing. Specifically, the paper proposes a decision - theoretic framework to maximize profit and optimize under the premise of controlling FDR. The following is a detailed interpretation of the core problems and solutions in the paper: #### Core problems 1. **Maximizing lifts**: - In A/B testing, experimenters are usually concerned with lift, that is, the improvement of the new version relative to the baseline version. How to effectively select experiments with significant lifts in a large number of tests is a key issue. 2. **Controlling the False Discovery Rate (FDR)**: - In large - scale A/B testing, if no appropriate correction is made, the proportion of false rejections of the null hypothesis may be very high, resulting in a large number of "false positive" results. Therefore, a method is needed to control FDR to ensure statistical significance while reducing false discoveries. 3. **Cost - benefit analysis**: - During the experiment, each test is accompanied by a certain cost. How to maximize the profit and control the false discovery rate while considering the cost is another important research direction. #### Solutions 1. **Decision - theoretic framework**: - The paper proposes a framework based on decision theory to optimize the results of A/B testing by maximizing the expected profit. Specifically, the objective function is to maximize \( \mathbb{E} \left[ \sum_{i \in R} \Delta_i \right] \), where \( \Delta_i = p_{ri} \ell_i \), \( p_{ri} \) is the baseline profit of the \( i \) - th experiment, and \( \ell_i \) is the observed lift. 2. **Empirical Bayes method**: - Use the empirical Bayes method to estimate the optimal decision rule. Through the greedy knapsack approach, the paper derives a ranking rule based on the local false discovery rate (lfdr) to select the optimal test set. 3. **Local false discovery rate (lfdr)**: - Introduce the local false discovery rate as a measurement standard, defined as \( \text{lfdr}(z_i) = P(\theta_i = 0 | Z_i) \), where \( \theta_i \) represents whether the \( i \) - th test is a true null hypothesis. Through the lfdr statistic, FDR can be controlled more precisely. 4. **Second - order mean correction**: - Propose a second - order mean correction for the logarithm - transformed ratio to improve the accuracy of the estimate. The specific formula is: \[ \mathbb{E}[\ln(\hat{p}_0/p_0)] \approx -\frac{1}{2} \text{Var}(\hat{p}_0/p_0) \] 5. **Numerical simulation and practical application**: - Verify the effectiveness of the proposed method through synthetic data and actual Optimizely experiment data. The results show that compared with the existing FDR control methods, this method can significantly improve the total lift while controlling FDR. #### Summary The paper solves the problem of how to maximize lifts under the premise of controlling FDR in large - scale A/B testing by introducing technical means such as decision - theoretic framework, empirical Bayes method and local false discovery rate. This method not only improves the efficiency and accuracy of the experiment, but also provides strong support for practical applications.