A Weibull Mixture Cure Frailty Model for High-dimensional Covariates

Fatih Kızılaslan,David Michael Swanson,Valeria Vitelli
DOI: https://doi.org/10.48550/arXiv.2401.06575
2024-04-26
Abstract:A novel mixture cure frailty model is introduced for handling censored survival data. Mixture cure models are preferable when the existence of a cured fraction among patients can be assumed. However, such models are heavily underexplored: frailty structures within cure models remain largely undeveloped, and furthermore, most existing methods do not work for high-dimensional datasets, when the number of predictors is significantly larger than the number of observations. In this study, we introduce a novel extension of the Weibull mixture cure model that incorporates a frailty component, employed to model an underlying latent population heterogeneity with respect to the outcome risk. Additionally, high-dimensional covariates are integrated into both the cure rate and survival part of the model, providing a comprehensive approach to employ the model in the context of high-dimensional omics data. We also perform variable selection via an adaptive elastic-net penalization, and propose a novel approach to inference using the expectation-maximization (EM) algorithm. Extensive simulation studies are conducted across various scenarios to demonstrate the performance of the model, and results indicate that our proposed method outperforms competitor models. We apply the novel approach to analyze RNAseq gene expression data from bulk breast cancer patients included in The Cancer Genome Atlas (TCGA) database. A set of prognostic biomarkers is then derived from selected genes, and subsequently validated via both functional enrichment analysis and comparison to the existing biological literature. Finally, a prognostic risk score index based on the identified biomarkers is proposed and validated by exploring the patients' survival.
Methodology,Applications,Computation
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is some limitations of the existing Mixture Cure Frailty Model (MCFM) when dealing with censored survival data on high - dimensional covariate data sets. Specifically, the paper focuses on the following points: 1. **Processing of high - dimensional data**: With the development of biomedical technology, molecular data from multiple modalities, such as genomics, epigenetics, transcriptomics, proteomics and metabolomics, can be generated and collected, and these data sets are usually high - dimensional. Traditional survival analysis methods face challenges when dealing with high - dimensional data, especially in terms of variable selection. 2. **Unobserved heterogeneity**: In medical and epidemiological studies, there is heterogeneity among individuals caused by unobserved factors that cannot be explained by known covariates. Frailty Models can be used to incorporate and explain this unobserved heterogeneity, thus modeling survival outcomes more accurately. 3. **Extension of the mixture cure model**: Existing mixture cure models mainly focus on low - dimensional data sets and are less applied to high - dimensional data sets. The paper proposes a new extended model, namely the Weibull Mixture Cure Frailty Model with a frailty component introduced, to deal with the survival analysis problems in high - dimensional covariate data sets. 4. **Variable selection**: In high - dimensional data sets, variable selection is an important issue. The paper adopts the Adaptive Elastic - Net penalty method for variable selection to improve the accuracy and interpretability of the model. 5. **Parameter estimation**: In order to perform parameter estimation and variable selection in high - dimensional data sets, the paper proposes a regularized version based on the Expectation - Maximization (EM) algorithm. Through this method, model parameters can be effectively estimated in high - dimensional data sets and relevant covariates can be selected. In summary, the main objective of this paper is to develop a mixture cure frailty model suitable for high - dimensional covariate data sets. By introducing the frailty component and the Adaptive Elastic - Net penalty method, the limitations of existing models in dealing with high - dimensional data are solved, and the accuracy and reliability of survival analysis are improved.