Optimal subsampling for the Cox proportional hazards model with massive survival data

Nan Qiao,Wangcheng Li,Feng Xiao,Cunjie Lin
DOI: https://doi.org/10.1016/j.jspi.2023.106136
IF: 1.095
2023-12-21
Journal of Statistical Planning and Inference
Abstract:Massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for Cox proportional hazards model with time-dependent covariates when the sample size is extraordinarily large but the computing resources are relatively limited. A subsample estimator is developed by maximizing a weighted partial likelihood, and shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expression. Simulation studies show that the proposed method has satisfactory performances in approximating the full data estimator. The proposed method is applied to the corporate loan data and breast cancer data, with different censoring rates, and the outcome also confirms the practical advantages.
statistics & probability
What problem does this paper attempt to address?