Abstract:A/B testers conducting large-scale tests prioritize lifts and want to be able to control false rejections of the null. This work develops a decision-theoretic framework for maximizing profits subject to false discovery rate (FDR) control. We build an empirical Bayes solution for the problem via the greedy knapsack approach. We derive an oracle rule based on ranking the ratio of expected lifts and the cost of wrong rejections using the local false discovery rate (lfdr) statistic. Our oracle decision rule is valid and optimal for large-scale tests. Further, we establish asymptotic validity for the data-driven procedure and demonstrate finite-sample validity in experimental studies. We also demonstrate the merit of the proposed method over other FDR control methods. Finally, we discuss an application to actual Optimizely experiments.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of how to optimize lifts and control the False Discovery Rate (FDR) in large - scale A/B testing. Specifically, the paper proposes a decision - theoretic framework to maximize profit and optimize under the premise of controlling FDR. The following is a detailed interpretation of the core problems and solutions in the paper: #### Core problems 1. **Maximizing lifts**: - In A/B testing, experimenters are usually concerned with lift, that is, the improvement of the new version relative to the baseline version. How to effectively select experiments with significant lifts in a large number of tests is a key issue. 2. **Controlling the False Discovery Rate (FDR)**: - In large - scale A/B testing, if no appropriate correction is made, the proportion of false rejections of the null hypothesis may be very high, resulting in a large number of "false positive" results. Therefore, a method is needed to control FDR to ensure statistical significance while reducing false discoveries. 3. **Cost - benefit analysis**: - During the experiment, each test is accompanied by a certain cost. How to maximize the profit and control the false discovery rate while considering the cost is another important research direction. #### Solutions 1. **Decision - theoretic framework**: - The paper proposes a framework based on decision theory to optimize the results of A/B testing by maximizing the expected profit. Specifically, the objective function is to maximize \( \mathbb{E} \left[ \sum_{i \in R} \Delta_i \right] \), where \( \Delta_i = p_{ri} \ell_i \), \( p_{ri} \) is the baseline profit of the \( i \) - th experiment, and \( \ell_i \) is the observed lift. 2. **Empirical Bayes method**: - Use the empirical Bayes method to estimate the optimal decision rule. Through the greedy knapsack approach, the paper derives a ranking rule based on the local false discovery rate (lfdr) to select the optimal test set. 3. **Local false discovery rate (lfdr)**: - Introduce the local false discovery rate as a measurement standard, defined as \( \text{lfdr}(z_i) = P(\theta_i = 0 | Z_i) \), where \( \theta_i \) represents whether the \( i \) - th test is a true null hypothesis. Through the lfdr statistic, FDR can be controlled more precisely. 4. **Second - order mean correction**: - Propose a second - order mean correction for the logarithm - transformed ratio to improve the accuracy of the estimate. The specific formula is: \[ \mathbb{E}[\ln(\hat{p}_0/p_0)] \approx -\frac{1}{2} \text{Var}(\hat{p}_0/p_0) \] 5. **Numerical simulation and practical application**: - Verify the effectiveness of the proposed method through synthetic data and actual Optimizely experiment data. The results show that compared with the existing FDR control methods, this method can significantly improve the total lift while controlling FDR. #### Summary The paper solves the problem of how to maximize lifts under the premise of controlling FDR in large - scale A/B testing by introducing technical means such as decision - theoretic framework, empirical Bayes method and local false discovery rate. This method not only improves the efficiency and accuracy of the experiment, but also provides strong support for practical applications.

Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests

Rapid and Scalable Bayesian AB Testing

Empirical Bayes Multistage Testing for Large-Scale Experiments

A framework for Multi-A(rmed)/B(andit) testing with online FDR control

Learning Metrics that Maximise Power for Accelerated A/B-Tests

Powerful A/B-Testing Metrics and Where to Find Them

Variance Reduction in Ratio Metrics for Efficient Online Experiments

Large-scale Multiple Testing: Fundamental Limits of False Discovery Rate Control and Compound Oracle

Bootstrap Matching: a robust and efficient correction for non-random A/B test, and its applications

Anytime-Valid Confidence Sequences in an Enterprise A/B Testing Platform

Precise unbiased estimation in randomized experiments using auxiliary observational data

Comparison Lift: Bandit-based Experimentation System for Online Advertising

Machine-Learning Tests for Effects on Multiple Outcomes

Near-Optimal Experimental Design under the Budget Constraint in Online Platforms.

Online control of the false discovery rate in biomedical research

Risk-aware product decisions in A/B tests with multiple metrics

Short-lived High-volume Multi-A(rmed)/B(andits) Testing

A more practical approach for the Benjamini-Hochberg FDR controlling procedure for huge-scale testing problems

A Common Misassumption in Online Experiments with Machine Learning Models

Simultaneous high-probability bounds on the false discovery proportion in structured, regression, and online settings

A New Procedure for Controlling False Discovery Rate in Large-Scale t-tests