Abstract:BackgroundSmall number of clusters and large variation of cluster sizes commonly exist in cluster-randomized trials (CRTs) and are often the critical factors affecting the validity and efficiency of statistical analyses. F tests are commonly used in the generalized linear mixed model (GLMM) to test intervention effects in CRTs. The most challenging issue for the approximate Wald F test is the estimation of the denominator degrees of freedom (DDF). Some DDF approximation methods have been proposed, but their small sample performances in analysing binary outcomes in CRTs with few heterogeneous clusters are not well studied.MethodsThe small sample performances of five DDF approximations for the F test are compared and contrasted under CRT frameworks with simulations. Specifically, we illustrate how the intraclass correlation (ICC), sample size, and the variation of cluster sizes affect the type I error and statistical power when different DDF approximation methods in GLMM are used to test intervention effect in CRTs with binary outcomes. The results are also illustrated using a real CRT dataset.ResultsOur simulation results suggest that the Between-Within method maintains the nominal type I error rates even when the total number of clusters is as low as 10 and is robust to the variation of the cluster sizes. The Residual and Containment methods have inflated type I error rates when the cluster number is small (<30) and the inflation becomes more severe with increased variation in cluster sizes. In contrast, the Satterthwaite and Kenward-Roger methods can provide tests with very conservative type I error rates when the total cluster number is small (<30) and the conservativeness becomes more severe as variation in cluster sizes increases. Our simulations also suggest that the Between-Within method is statistically more powerful than the Satterthwaite or Kenward-Roger method in analysing CRTs with heterogeneous cluster sizes, especially when the cluster number is small.ConclusionWe conclude that the Between-Within denominator degrees of freedom approximation method for F tests should be recommended when the GLMM is used in analysing CRTs with binary outcomes and few heterogeneous clusters, due to its type I error properties and relatively higher power.

Testing the Normal Approximation and Minimal Sample Size Requirements of Weighted Kappa When the Number of Categories is Large

Measuring agreement among several raters classifying subjects into one-or-more (hierarchical) nominal categories. A generalisation of Fleiss' kappa

Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters

Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

Normality and significance testing in simple linear regression model for large sample sizes: a simulation study

Minimum sample size for developing a multivariable prediction model using multinomial logistic regression

Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials

Review of sample size determination methods for the intraclass correlation coefficient in the one-way analysis of variance model

The Minimax Risk in Testing Uniformity of Poisson Data under Missing Ball Alternatives within a Hypercube

Sample size determination for logistic regression revisited

Evaluating Small Sample Approaches for Model Test Statistics in Structural Equation Modeling

Experimental study of recognition rate in statistical pattern classification based on finite size of design sample set

Asymptotic Confidence Interval, Sample Size Formulas and Comparison Test for the Agreement Intra-Class Correlation Coefficient in Inter-Rater Reliability Studies

Impact of Sample Size and Variability on the Power and Type I Error Rates of Equivalence Tests: A Simulation Study.

Sample Size Determination for Testing the Variance Compound in a One-Way Random Effects Model

Evaluating the robustness of repeated measures analyses: The case of small sample sizes and nonnormal data

Small sample sizes: A big data problem in high-dimensional data analysis

A Dunnett-Type Test and Its Sample Size Calculation for Comparing K ROC Curves with a Control

Sufficient Sample Sizes for Multilevel Modeling

Tutorial: a priori estimation of sample size, effect size, and statistical power for cluster analysis, latent class analysis, and multivariate mixture models

Sensitivity and specificity of normality tests and consequences on reference interval accuracy at small sample size: a computer-simulation study