No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

Maarten van Smeden,Joris A. H. de Groot,Karel G. M. Moons,Gary S. Collins,Douglas G. Altman,Marinus J. C. Eijkemans,Johannes B. Reitsma
DOI: https://doi.org/10.1186/s12874-016-0267-3
2016-11-24
BMC Medical Research Methodology
Abstract:BackgroundTen events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.MethodsThe current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared.ResultsThe results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation.ConclusionsThe current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
health care sciences & services
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is about the validity and applicability of the "10 events per variable" (10 EPV) criterion in binary logistic regression analysis. Specifically: 1. **Explore the validity of the EPV criterion**: The paper reviews previous studies on the 10 EPV criterion and finds that there are significant differences among these studies, resulting in inconsistent recommendations for the minimum EPV standard. The author explains these differences through new simulation studies and evaluates the accuracy of estimation in low - EPV settings. 2. **Evaluate problems in low - EPV situations**: The paper points out that low EPV is related not only to the sample size but also to other factors such as the total sample size, the strength of association between covariates, and the correlation among covariates. These problems may lead to estimation bias, insufficient confidence interval coverage, and an increase in the mean - squared error. 3. **Separate the influence of problems**: The paper pays special attention to the "separation" phenomenon, that is, some covariates or linear combinations of covariates can perfectly separate events from non - events. In this case, the maximum - likelihood estimation may fail, resulting in unstable estimates or failure to converge. The author explores the impact of different methods for handling separated data sets on the simulation results. 4. **The effect of Firth correction**: The paper evaluates the effect of the Firth correction method in reducing small - sample bias and improving the accuracy of regression coefficient estimation. The Firth correction adjusts the likelihood function by introducing a penalty term, thereby improving the estimation performance in small - sample situations. In summary, this paper aims to re - evaluate the rationality and applicability of the 10 EPV criterion through detailed simulation studies, especially for binary logistic regression analysis in small - sample and low - EPV situations. The paper also proposes improved statistical methods, such as Firth correction, to improve the model performance under such conditions.