A two-step penalization and shrinkage approach for binary response data that is jointly separated and correlated: The effects of social networks on diarrheal disease

Sonia T. Hegde,Joseph Eisenberg,Lauren J. Beesley,Bhramar Mukherjee
DOI: https://doi.org/10.1101/2024.03.13.24304191
2024-03-18
Abstract:Epidemiologic data often violate common modeling assumptions of independence between subjects due to study design. Statistical separation is also common, particularly in the study of rare binary outcomes. Statistical separation for binary outcomes occurs when regions of the covariate space have no variation in the outcome, and separation can negatively impact the validity of logistic regression model parameters. When data are correlated, we generally use multi-level modeling for parameter estimation, and statistical approached have also been developed for handling statistical separation. Approaches for analyzing data with separation and complex correlation, however, are not well-known. Extending prior work, we demonstrate a two-stage Bayesian modeling approach to account for both separated and highly correlated data through a motivating example examining the effect of social ties on Acute Gastrointestinal Illness (AGI) in rural Ecuador. The two-stage approach involves fitting a Bayesian hierarchical model to account for correlation using priors derived from parameter estimates from a Firth-corrected logistic regression model to account for separation. We compare estimates from the two-stage approach to standard regression methods that only account for either separation or correlation. Our results demonstrate that correctly accounting for separation and correlation when both are present can potentially provide better inference.
Epidemiology
What problem does this paper attempt to address?