Abstract:Abstract Background We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Methods We illustrate the approach using data for the diagnosis of ovarian cancer ( n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction). Results Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used. Conclusions Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.

Sample size and performance estimation for biomarker combinations based on pilot studies with small sample sizes

Hybrid Design Evaluating New Biomarkers when There is an Existing Screening Test

Computation and Selection of Optimal Biomarker Combinations by Integrative ROC Analysis Using CombiROC

On the Optimal Combination of Elliptically Distributed Biomarkers to Improve Diagnostic Accuracy

Sample size planning for pilot studies

A novel approach for biomarker selection and the integration of repeated measures experiments from two assays

Sample size determination for external pilot cluster randomised trials with binary feasibility outcomes: a tutorial

Power/sample size calculations for assessing correlates of risk in clinical efficacy trials

Sample Size Estimation Using a Partially Clustered Frailty Model for Biomarker‐Strategy Designs With Multiple Treatments

MetSizeR: selecting the optimal sample size for metabolomic studies using an analysis based approach

Developing Biomarker Combinations in Multicenter Studies via Direct Maximization and Penalization

Selection and combination of biomarkers using ROC method for disease classification and prediction

Identification of a Novel Biomarker Panel for Breast Cancer Screening

Sample size estimation for cancer randomized trials in the presence of heterogeneous populations

Sample size determination for training cancer classifiers from microarray and RNA-seq data

Considerations in determining sample size for pilot studies

A simple formula for the calculation of sample size in pilot studies

Adaptive sample size determination for the development of clinical prediction models

Determination Of Minimum Training Sample Size For Microarray-Based Cancer Outcome Prediction-An Empirical Assessment

General considerations for sample size estimation in animal study

Efficient screening of predictive biomarkers for individual treatment selection