Subgroup Identification and Interpretation with Bayesian Nonparametric Models in Health Care Claims Data

Christoph Kurz,Laura Hatfield
DOI: https://doi.org/10.48550/arXiv.1711.07527
IF: 5.414
2017-11-20
Machine Learning
Abstract:Inpatient care is a large share of total health care spending, making analysis of inpatient utilization patterns an important part of understanding what drives health care spending growth. Common features of inpatient utilization measures include zero inflation, over-dispersion, and skewness, all of which complicate statistical modeling. Mixture modeling is a popular approach that can accommodate these features of health care utilization data. In this work, we add a nonparametric clustering component to such models. Our fully Bayesian model framework allows for an unknown number of mixing components, so that the data determine the number of mixture components. When we apply the modeling framework to data on hospital lengths of stay for patients with lung cancer, we find distinct subgroups of patients with differences in means and variances of hospital days, health and treatment covariates, and relationships between covariates and length of stay.
What problem does this paper attempt to address?