An optimal subsampling design for large-scale Cox model with censored data

Shiqi Liu,Zilong Xie,Ming Zheng,Wen Yu
DOI: https://doi.org/10.1080/02664763.2024.2423234
IF: 1.416
2024-11-06
Journal of Applied Statistics
Abstract:Subsampling designs are useful for reducing computational load and storage cost for large-scale data analysis. For massive survival data with right censoring, we propose a class of optimal subsampling designs under the widely-used Cox model. The proposed designs utilize information from both the outcome and the covariates. Different forms of the design can be derived adaptively to meet various targets, such as optimizing the overall estimation accuracy or minimizing the variation of specific linear combination of the estimators. Given the subsampled data, the inverse probability weighting approach is employed to estimate the model parameters. The resultant estimators are shown to be consistent and asymptotically normally distributed. Simulation results indicate that the proposed subsampling design yields more efficient estimators than the uniform subsampling by using subsampled data of comparable sample sizes. Additionally, the subsampling estimation significantly reduces the computational load and storage cost relative to the full data estimation. An analysis of a real data example is provided for illustration.
statistics & probability
What problem does this paper attempt to address?