Abstract:Background: Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data. Methods: Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database. Results: Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors. Conclusions: Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.

Empirical Analysis of Zipf's Law, Power Law, and Lognormal Distributions in Medical Discharge Reports

Power-Law Distributions in Empirical Data

Power-law distributions in binned empirical data

Large-Scale Analysis of Zipf’s Law in English Texts

A Novel Discrete Linear-Exponential Distribution for Modeling Physical and Medical Data

A new Bayesian regression model for counts in medicine

A heteroscedastic Bayesian generalized logistic regression model with application to scaling problems

Probability Distribution of Causal Linguistic Features.

Bayesian Analysis of Population Health Data

Degree distributions in networks: beyond the power law

A comparison of statistical methods for modeling count data with an application to hospital length of stay

Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions

A NEW flexible exponent power family of distributions with biomedical data analysis

Log-logistic Distribution as a Reliability Model: A Bayesian Analysis

POWER-LAW DISTRIBUTIONS BASED ON EXPONENTIAL DISTRIBUTIONS: LATENT SCALING, SPURIOUS ZIPF'S LAW, AND FRACTAL RABBITS

Lognormals, Power Laws and Double Power Laws in the Distribution of Frequencies of Harmonic Codewords from Classical Music

Abstract 13965: Natural Language Processing of Hospitalization Discharge Summary to Predict 1-year Post-Discharge Mortality Among Patients With Acute Heart Failure

A Bayesian Analysis for the Parameters of the Exponential-Logarithmic Distribution

A Bayesian multivariate spatial approach for illness-death survival models

A robust regression model for bounded count health data

A scaling law beyond Zipf's law and its relation to Heaps' law