Survival Cluster Analysis

Paidamoyo Chapfuwa,Chunyuan Li,Nikhil Mehta,Lawrence Carin,Ricardo Henao
DOI: https://doi.org/10.48550/arXiv.2003.00355
2020-03-01
Abstract:Conventional survival analysis approaches estimate risk scores or individualized time-to-event distributions conditioned on covariates. In practice, there is often great population-level phenotypic heterogeneity, resulting from (unknown) subpopulations with diverse risk profiles or survival distributions. As a result, there is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles, while jointly accounting for accurate individualized time-to-event predictions. An approach that addresses this need is likely to improve characterization of individual outcomes by leveraging regularities in subpopulations, thus accounting for population-level heterogeneity. In this paper, we propose a Bayesian nonparametrics approach that represents observations (subjects) in a clustered latent space, and encourages accurate time-to-event predictions and clusters (subpopulations) with distinct risk profiles. Experiments on real-world datasets show consistent improvements in predictive performance and interpretability relative to existing state-of-the-art survival analysis models.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to identify subgroups with different risk characteristics in survival analysis while providing accurate individual time - to - event predictions. Traditional survival analysis methods usually estimate covariate - based risk scores or individualized time - to - event distributions, but these methods often overlook the phenotypic heterogeneity at the population level, that is, the diverse risk characteristics or survival distributions caused by unknown subgroups. Therefore, there is an unmet need to identify subgroups with unique risk characteristics in survival analysis while taking into account accurate individual time - to - event predictions. This need is addressed by improving the description of individual outcomes by leveraging the regularities in subgroups. The paper proposes a Bayesian non - parametric method that represents the observed objects (subjects) in the latent space of clustering and encourages accurate time - to - event predictions and clustering (subgroups) with unique risk characteristics. Experiments show that compared with the existing state - of - the - art survival analysis models, this method has a significant improvement in both prediction performance and interpretability.