Abstract:Randomized controlled trials (RCTs) can be used to generate guarantees on treatment effects. However, RCTs often spend unnecessary resources exploring sub-optimal treatments, which can reduce the power of treatment guarantees. To address these concerns, we develop a two-stage RCT where, first on a data-driven screening stage, we prune low-impact treatments, while in the second stage, we develop high probability lower bounds on the treatment effect. Unlike existing adaptive RCT frameworks, our method is simple enough to be implemented in scenarios with limited adaptivity. We derive optimal designs for two-stage RCTs and demonstrate how we can implement such designs through sample splitting. Empirically, we demonstrate that two-stage designs improve upon single-stage approaches, especially in scenarios where domain knowledge is available in the form of a prior. Our work is thus, a simple, yet effective, method to estimate high probablility certificates for high performant treatment effects on a RCT.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of unreasonable resource allocation in randomized controlled trials (RCTs). Traditional single - stage RCTs usually evenly allocate samples across all treatment regimens, which leads to a waste of resources on less effective treatment regimens and thus reduces the statistical power for high - quality treatment effects. Specifically: 1. **Resource waste**: Traditional RCTs will spend a large amount of resources exploring sub - optimal treatment regimens, and these regimens may not be ultimately adopted. 2. **Insufficient statistical power**: Due to the dispersion of resources, the precision of estimating the effects of high - quality treatment regimens is low. 3. **Complexity of adaptive design**: Although adaptive design can improve statistical performance, it is more difficult to implement, especially in the case of delayed results. To solve these problems, the paper proposes a two - stage RCT design, which screens out potentially highly effective treatment regimens in a data - driven manner and concentrates resources in the second stage for more accurate effect estimation. This method aims to simplify the experimental design while improving statistical power and generating a high - probability lower bound (certificate) of treatment effects. ### Main contributions of the paper 1. **Novel algorithm**: A new two - stage RCT algorithm is designed, which can identify the high - probability lower bound of treatment effects and prove the approximation guarantee of this algorithm relative to the optimal strategy. 2. **Bayesian extension**: The method is extended to the Bayesian framework, in which the experimenter has a prior distribution of the means of each treatment regimen, and the corresponding approximation guarantee is provided. 3. **Empirical verification**: Through experiments on synthetic data and real - world data sets, it is proved that the two - stage design is superior to the single - stage design without increasing complexity. ### Formula representation To ensure the correctness and readability of the formulas, the following are some key formulas involved in the paper: - **Hoeffding inequality**: \[ P\left(\left|Y_i-\mu_i\right|<r\sqrt{\frac{k}{2s_2}\log\left(\frac{2}{\delta}\right)}\right)\geq1 - \delta \] where \(Y_i\) is the sample mean in the second stage, \(\mu_i\) is the true mean, \(s_2\) is the sample size in the second stage, \(k\) is the number of retained treatment regimens, and \(\delta\) is the confidence level. - **Certificate calculation**: \[ l = Y_i - q\sqrt{\frac{k}{2s_2}\log\left(\frac{2}{\delta}\right)} \] - **Tail bound of Sub - Gaussian variables**: \[ P\left(\left|\bar{Y}_i-\mu_i\right|\leq r\sqrt{\frac{2k}{s_2}\log\left(\frac{2}{\delta}\right)}\right)\geq1 - \delta \] These formulas are used to describe how to estimate the lower bound of treatment effects in a two - stage RCT and ensure the statistical validity of the estimate.

Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects

Adaptive Clinical Trials: Exploiting Sequential Patient Recruitment and Allocation

Randomization to Randomization Probability: Estimating Treatment Effects under Actual Conditions of Use.

An Adaptive Trial Design to Optimize Dose-Schedule Regimes with Delayed Outcomes

BASIC: A Bayesian Adaptive Synthetic-Control Design for Phase II Clinical Trials.

Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation

A practical Response Adaptive Block Randomization (RABR) design with analytic type I error protection

Response adaptive randomization procedures in seamless phase II/III clinical trials.

Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing

Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

Selective Randomization Inference for Adaptive Experiments

Implementing Response-Adaptive Randomisation in Stratified Rare-disease Trials: Design Challenges and Practical Solutions

Design-Based RCT Estimators and Central Limit Theorems for Baseline Subgroup and Related Analyses

Precise unbiased estimation in randomized experiments using auxiliary observational data

Robust integration of external control data in randomized trials

The use of real-world data for clinical investigation of effectiveness in drug development

A matching design for augmenting a randomized clinical trial with external control

Design-Based Estimation and Central Limit Theorems for Local Average Treatment Effects for RCTs

RCT Rejection Sampling for Causal Estimation Evaluation

SECRETS: Subject-Efficient Clinical Randomized Controlled Trials using Synthetic Intervention

A Two-Stage Patient-Focused Study Design for Rare Disease Controlled Trials