Abstract:Researchers are often interested in predicting outcomes, conducting clustering analysis to detect distinct subgroups of their data, or computing causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a fully nonparametric Bayesian generative model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest - allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. Lastly, we use our proposed method to analyze zero-inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy and radiation therapy in the SEER medicare database.

Subgroup Identification and Interpretation with Bayesian Nonparametric Models in Health Care Claims Data

Medical Inpatient Journey Modeling and Clustering: A Bayesian Hidden Markov Model Based Approach.

Using Integrated Nested Laplace Approximation for Modeling Spatial Healthcare Utilization

Analyzing covariate clustering effects in healthcare cost subgroups: insights and applications for prediction

Bayesian estimation for longitudinal data in a joint model with HPCs

A Bayesian Nonparametric Model for Zero-Inflated Outcomes: Prediction, Clustering, and Causal Estimation

A Bayesian approach for fitting semi-Markov mixture models of cancer latency to individual-level data

A Bayesian approach to disease clustering using restricted Chinese restaurant processes

Stochastic Process and Health Data: A Full Maximum Likelihood Method to Hospital Charge and Length of Stay Data

Bayesian approaches to variable selection in mixture models with application to disease clustering

Bayesian Shrinkage Estimation of Credible Subgroups for Count Data with Excess Zeros

Data-driven subgrouping of patient trajectories with chronic diseases: Evidence from low back pain

Modeling Heterogeneity and Missing Data of Multiple Longitudinal Outcomes in Electronic Health Records

Bayesian nonparametric mixtures of categorical directed graphs for heterogeneous causal inference

Without Pain -- Clustering Categorical Data Using a Bayesian Mixture of Finite Mixtures of Latent Class Analysis Models

Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis, linked by latent class membership: with application to AIDS clinical studies

Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine

Bayesian joint analysis of heterogeneous- and skewed-longitudinal data and a binary outcome, with application to AIDS clinical studies

Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series

Bayesian modelling of inseparable space-time variation in disease risk

A Bayesian Nonparametric Model for Predicting Pregnancy Outcomes Using Longitudinal Profiles