Blinded sample size re-estimation in equivalence testing

Ekkehard Glimm,Lillian Yau,Heike Woehling
DOI: https://doi.org/10.1080/19466315.2020.1845232
2019-08-13
Abstract:This paper investigates type I error violations that occur when blinded sample size reviews are applied in equivalence testing. We give a derivation which explains why such violations are more pronounced in equivalence testing than in the case of superiority testing. In addition, the amount of type I error inflation is quantified by simulation as well as by some theoretical considerations. Non-negligible type I error violations arise when blinded interim re-assessments of sample sizes are performed particularly if sample sizes are small, but within the range of what is practically relevant.
Applications,Methodology
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper explores the issue of Type I error inflation that occurs during Blinded Sample Size Re-Estimation (SSR) in equivalence testing. Specifically, the authors investigate why this Type I error inflation is more pronounced in equivalence testing compared to superiority testing, and quantify the extent of this error inflation through simulations and theoretical analysis. ### Background and Motivation 1. **History of Sample Size Re-Estimation**: The concept of sample size re-estimation appeared in statistical literature as early as the 1940s. Since then, related research has gradually increased. 2. **Blinded Sample Size Re-Estimation**: Methods for re-evaluating sample size through interim analysis without unblinding have been widely studied. These methods are usually based on estimates of sample variance without knowing the specific allocation of each group. 3. **Problem Statement**: While Type I error inflation is very small and only appears in small sample sizes in superiority testing, this error inflation can be larger in non-inferiority and equivalence testing, especially within the range of small but practically relevant sample sizes. ### Main Research Content 1. **Decomposition of Total Variance**: The authors explain why Type I error inflation occurs in non-inferiority and equivalence testing by decomposing the total variance. 2. **Two One-Sided Tests (TOST)**: The TOST method in equivalence testing is introduced, and its performance under different assumptions is discussed. 3. **Simulation Studies**: Simulation studies explore the impact of first-stage sample size, second-stage sample size, and effect size on Type I error inflation. 4. **Influencing Factors**: The analysis includes how different factors affect Type I error inflation, including first-stage sample size, minimum and maximum sample size limits, etc. ### Main Findings 1. **Cause of Type I Error Inflation**: When the estimate of total variance is large, the sample size re-estimation rule may lead to the recruitment of more second-stage samples, diluting the evidence supporting the alternative hypothesis and increasing the probability of Type I error. 2. **Effect Size Impact**: When the effect size is small, Type I error inflation is not significant; as the effect size increases, Type I error inflation becomes more pronounced, especially when the effect size is close to the equivalence boundary. 3. **Role of Sample Size Limits**: Setting minimum and maximum sample size limits can partially mitigate the Type I error inflation issue, but this may also lead to insufficient or excessive study power. ### Practical Recommendations 1. **Increase First-Stage Sample Size**: Increasing the first-stage sample size can reduce Type I error inflation, but practical considerations of time cost must be taken into account. 2. **Set Sample Size Limits**: Setting minimum and maximum sample size limits can help control Type I error inflation to some extent, but a balance must be struck between study power and resource utilization. ### Conclusion Through theoretical analysis and simulation studies, the paper reveals the potential Type I error inflation issue during Blinded Sample Size Re-Estimation in equivalence testing and provides practical recommendations. These findings are of significant importance for clinical trial design and data analysis.